West Midlands' Carbon Footprint
Understanding the Data¶
Internet Repositories:
From exploration of the '1_1' sheet
Shape of the Data: The dataset contains 7106 rows and 50 columns. Summary of the Data: The dataset provides details for 12 unique regions. There are data for 17 unique years, spanning from 2005 to 2021. Variables such as 'Industry Electricity', 'Industry Gas', and 'Industry Total'
Process Flow
import pandas as pd
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
# Get the names of the sheets in the Excel file
# url = 'https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1166194/2005-21-uk-local-authority-ghg-emissions.xlsx'
# data_1_1 = pd.read_excel(url, sheet_name='1_1')
sheet_names = pd.ExcelFile("2005-21-uk-local-authority-ghg-emissions-update-060723.xlsx").sheet_names
sheet_names
['Cover', 'Contents', '1_1', '1_2', '1_3', '1_4', '2_1', '3_1', '3_2', '4_1', '4_1_Notes', '4_2', '4_3', '4_4', '4_5', '5_1']
# Load the first few rows from the '1_1' sheet to explore its contents
#url = 'https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1166194/2005-21-uk-local-authority-ghg-emissions.xlsx'
#data_1_1 = pd.read_excel(url, sheet_name='1_1')
data_1_1 = pd.read_excel("2005-21-uk-local-authority-ghg-emissions-update-060723.xlsx", sheet_name='1_1')
data_1_1.head()
| Table 1.1: Local Authority territorial greenhouse gas emissions estimates 2005-2021 (kt CO2e) - Full dataset | Unnamed: 1 | Unnamed: 2 | Unnamed: 3 | Unnamed: 4 | Unnamed: 5 | Unnamed: 6 | Unnamed: 7 | Unnamed: 8 | Unnamed: 9 | ... | Unnamed: 40 | Unnamed: 41 | Unnamed: 42 | Unnamed: 43 | Unnamed: 44 | Unnamed: 45 | Unnamed: 46 | Unnamed: 47 | Unnamed: 48 | Unnamed: 49 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | This worksheet contains one table. The table c... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 1 | Freeze panes are active on this sheet. To turn... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2 | Filters are active in cells A5 to AX5 and may ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 3 | Region/Country | Second Tier Authority | Local Authority | Local Authority Code | Calendar Year | Industry Electricity | Industry Gas | Large Industrial Installations | Industry 'Other' | Industry Total | ... | Agriculture Soils | Agriculture Total | Landfill | Waste Management 'Other' | Waste Management Total | Grand Total | Population ('000s, mid-year estimate) | Per Capita Emissions (tCO2e) | Area (km2) | Emissions per km2 (kt CO2e) |
| 4 | North East | Darlington | Darlington | E06000005 | 2005 | 51.87311 | 114.701874 | 0.045681 | 43.073543 | 209.694209 | ... | 12.699185 | 63.7069 | 34.256022 | 5.613394 | 39.869415 | 968.661604 | 100.287 | 9.658895 | 197.4758 | 4.905217 |
5 rows × 50 columns
# Extract the actual data, excluding the header information
data_1_1_actual = data_1_1.iloc[4:]
# Set the column names from the header row
data_1_1_actual.columns = data_1_1.iloc[3]
data_1_1_actual
| 3 | Region/Country | Second Tier Authority | Local Authority | Local Authority Code | Calendar Year | Industry Electricity | Industry Gas | Large Industrial Installations | Industry 'Other' | Industry Total | ... | Agriculture Soils | Agriculture Total | Landfill | Waste Management 'Other' | Waste Management Total | Grand Total | Population ('000s, mid-year estimate) | Per Capita Emissions (tCO2e) | Area (km2) | Emissions per km2 (kt CO2e) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 4 | North East | Darlington | Darlington | E06000005 | 2005 | 51.87311 | 114.701874 | 0.045681 | 43.073543 | 209.694209 | ... | 12.699185 | 63.7069 | 34.256022 | 5.613394 | 39.869415 | 968.661604 | 100.287 | 9.658895 | 197.4758 | 4.905217 |
| 5 | North East | Darlington | Darlington | E06000005 | 2006 | 55.398988 | 97.614091 | 0.065836 | 42.086928 | 195.165842 | ... | 11.659759 | 59.927177 | 31.802024 | 6.57408 | 38.376104 | 943.627506 | 101.509 | 9.295998 | 197.4758 | 4.778446 |
| 6 | North East | Darlington | Darlington | E06000005 | 2007 | 52.249398 | 95.167732 | 0.074503 | 43.79521 | 191.286843 | ... | 12.166507 | 59.716732 | 37.048534 | 6.81813 | 43.866664 | 925.275164 | 102.632 | 9.015465 | 197.4758 | 4.685512 |
| 7 | North East | Darlington | Darlington | E06000005 | 2008 | 51.651166 | 95.266031 | 0.055964 | 35.982972 | 182.956132 | ... | 12.55795 | 59.566044 | 7.662955 | 6.771385 | 14.43434 | 876.045542 | 103.694 | 8.448373 | 197.4758 | 4.436217 |
| 8 | North East | Darlington | Darlington | E06000005 | 2009 | 45.607413 | 82.045964 | 1.061014 | 26.863721 | 155.578112 | ... | 11.405542 | 57.689627 | 16.001037 | 6.053694 | 22.054731 | 805.592471 | 104.355 | 7.71973 | 197.4758 | 4.079449 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 7105 | National Total | National Total | National Total | NaN | 2017 | 18256.184881 | 15602.265158 | 33619.941766 | 17990.618473 | 85469.010278 | ... | 9689.245642 | 51301.203756 | 15821.005923 | 5131.240623 | 20952.246546 | 437824.383087 | 66067.257699 | 6.62695 | 248717.5706 | 1.760328 |
| 7106 | National Total | National Total | National Total | NaN | 2018 | 20997.236163 | 19161.407779 | 32309.811618 | 17713.459471 | 90181.915031 | ... | 9573.565596 | 51380.522611 | 15911.984565 | 5088.278772 | 21000.263337 | 430745.509283 | 66371.006647 | 6.489965 | 248717.5706 | 1.731866 |
| 7107 | National Total | National Total | National Total | NaN | 2019 | 18512.578663 | 18433.106892 | 31951.4012 | 17364.788398 | 86261.875153 | ... | 9770.415392 | 50706.546836 | 15684.547926 | 5035.738233 | 20720.286159 | 416856.663324 | 66769.633181 | 6.243207 | 248717.5706 | 1.676024 |
| 7108 | National Total | National Total | National Total | NaN | 2020 | 15164.001049 | 17463.024978 | 30135.710515 | 17584.10729 | 80346.843833 | ... | 8890.946812 | 49220.899615 | 14304.397217 | 4915.16683 | 19219.564047 | 376807.810496 | 67044.605507 | 5.620255 | 248717.5706 | 1.515003 |
| 7109 | National Total | National Total | National Total | NaN | 2021 | 17109.121337 | 20037.052043 | 29267.622916 | 17927.304689 | 84341.100985 | ... | 9247.865828 | 50711.980197 | 13618.050957 | 5195.505715 | 18813.556673 | 399046.140782 | 67026.307 | 5.953575 | 248717.5706 | 1.604415 |
7106 rows × 50 columns
# Reset the index for the actual data
data_1_1_actual = data_1_1_actual.reset_index(drop=True)
data_1_1_actual
| 3 | Region/Country | Second Tier Authority | Local Authority | Local Authority Code | Calendar Year | Industry Electricity | Industry Gas | Large Industrial Installations | Industry 'Other' | Industry Total | ... | Agriculture Soils | Agriculture Total | Landfill | Waste Management 'Other' | Waste Management Total | Grand Total | Population ('000s, mid-year estimate) | Per Capita Emissions (tCO2e) | Area (km2) | Emissions per km2 (kt CO2e) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | North East | Darlington | Darlington | E06000005 | 2005 | 51.87311 | 114.701874 | 0.045681 | 43.073543 | 209.694209 | ... | 12.699185 | 63.7069 | 34.256022 | 5.613394 | 39.869415 | 968.661604 | 100.287 | 9.658895 | 197.4758 | 4.905217 |
| 1 | North East | Darlington | Darlington | E06000005 | 2006 | 55.398988 | 97.614091 | 0.065836 | 42.086928 | 195.165842 | ... | 11.659759 | 59.927177 | 31.802024 | 6.57408 | 38.376104 | 943.627506 | 101.509 | 9.295998 | 197.4758 | 4.778446 |
| 2 | North East | Darlington | Darlington | E06000005 | 2007 | 52.249398 | 95.167732 | 0.074503 | 43.79521 | 191.286843 | ... | 12.166507 | 59.716732 | 37.048534 | 6.81813 | 43.866664 | 925.275164 | 102.632 | 9.015465 | 197.4758 | 4.685512 |
| 3 | North East | Darlington | Darlington | E06000005 | 2008 | 51.651166 | 95.266031 | 0.055964 | 35.982972 | 182.956132 | ... | 12.55795 | 59.566044 | 7.662955 | 6.771385 | 14.43434 | 876.045542 | 103.694 | 8.448373 | 197.4758 | 4.436217 |
| 4 | North East | Darlington | Darlington | E06000005 | 2009 | 45.607413 | 82.045964 | 1.061014 | 26.863721 | 155.578112 | ... | 11.405542 | 57.689627 | 16.001037 | 6.053694 | 22.054731 | 805.592471 | 104.355 | 7.71973 | 197.4758 | 4.079449 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 7101 | National Total | National Total | National Total | NaN | 2017 | 18256.184881 | 15602.265158 | 33619.941766 | 17990.618473 | 85469.010278 | ... | 9689.245642 | 51301.203756 | 15821.005923 | 5131.240623 | 20952.246546 | 437824.383087 | 66067.257699 | 6.62695 | 248717.5706 | 1.760328 |
| 7102 | National Total | National Total | National Total | NaN | 2018 | 20997.236163 | 19161.407779 | 32309.811618 | 17713.459471 | 90181.915031 | ... | 9573.565596 | 51380.522611 | 15911.984565 | 5088.278772 | 21000.263337 | 430745.509283 | 66371.006647 | 6.489965 | 248717.5706 | 1.731866 |
| 7103 | National Total | National Total | National Total | NaN | 2019 | 18512.578663 | 18433.106892 | 31951.4012 | 17364.788398 | 86261.875153 | ... | 9770.415392 | 50706.546836 | 15684.547926 | 5035.738233 | 20720.286159 | 416856.663324 | 66769.633181 | 6.243207 | 248717.5706 | 1.676024 |
| 7104 | National Total | National Total | National Total | NaN | 2020 | 15164.001049 | 17463.024978 | 30135.710515 | 17584.10729 | 80346.843833 | ... | 8890.946812 | 49220.899615 | 14304.397217 | 4915.16683 | 19219.564047 | 376807.810496 | 67044.605507 | 5.620255 | 248717.5706 | 1.515003 |
| 7105 | National Total | National Total | National Total | NaN | 2021 | 17109.121337 | 20037.052043 | 29267.622916 | 17927.304689 | 84341.100985 | ... | 9247.865828 | 50711.980197 | 13618.050957 | 5195.505715 | 18813.556673 | 399046.140782 | 67026.307 | 5.953575 | 248717.5706 | 1.604415 |
7106 rows × 50 columns
data_1_1_actual
| 3 | Region/Country | Second Tier Authority | Local Authority | Local Authority Code | Calendar Year | Industry Electricity | Industry Gas | Large Industrial Installations | Industry 'Other' | Industry Total | ... | Agriculture Soils | Agriculture Total | Landfill | Waste Management 'Other' | Waste Management Total | Grand Total | Population ('000s, mid-year estimate) | Per Capita Emissions (tCO2e) | Area (km2) | Emissions per km2 (kt CO2e) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | North East | Darlington | Darlington | E06000005 | 2005 | 51.87311 | 114.701874 | 0.045681 | 43.073543 | 209.694209 | ... | 12.699185 | 63.7069 | 34.256022 | 5.613394 | 39.869415 | 968.661604 | 100.287 | 9.658895 | 197.4758 | 4.905217 |
| 1 | North East | Darlington | Darlington | E06000005 | 2006 | 55.398988 | 97.614091 | 0.065836 | 42.086928 | 195.165842 | ... | 11.659759 | 59.927177 | 31.802024 | 6.57408 | 38.376104 | 943.627506 | 101.509 | 9.295998 | 197.4758 | 4.778446 |
| 2 | North East | Darlington | Darlington | E06000005 | 2007 | 52.249398 | 95.167732 | 0.074503 | 43.79521 | 191.286843 | ... | 12.166507 | 59.716732 | 37.048534 | 6.81813 | 43.866664 | 925.275164 | 102.632 | 9.015465 | 197.4758 | 4.685512 |
| 3 | North East | Darlington | Darlington | E06000005 | 2008 | 51.651166 | 95.266031 | 0.055964 | 35.982972 | 182.956132 | ... | 12.55795 | 59.566044 | 7.662955 | 6.771385 | 14.43434 | 876.045542 | 103.694 | 8.448373 | 197.4758 | 4.436217 |
| 4 | North East | Darlington | Darlington | E06000005 | 2009 | 45.607413 | 82.045964 | 1.061014 | 26.863721 | 155.578112 | ... | 11.405542 | 57.689627 | 16.001037 | 6.053694 | 22.054731 | 805.592471 | 104.355 | 7.71973 | 197.4758 | 4.079449 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 7101 | National Total | National Total | National Total | NaN | 2017 | 18256.184881 | 15602.265158 | 33619.941766 | 17990.618473 | 85469.010278 | ... | 9689.245642 | 51301.203756 | 15821.005923 | 5131.240623 | 20952.246546 | 437824.383087 | 66067.257699 | 6.62695 | 248717.5706 | 1.760328 |
| 7102 | National Total | National Total | National Total | NaN | 2018 | 20997.236163 | 19161.407779 | 32309.811618 | 17713.459471 | 90181.915031 | ... | 9573.565596 | 51380.522611 | 15911.984565 | 5088.278772 | 21000.263337 | 430745.509283 | 66371.006647 | 6.489965 | 248717.5706 | 1.731866 |
| 7103 | National Total | National Total | National Total | NaN | 2019 | 18512.578663 | 18433.106892 | 31951.4012 | 17364.788398 | 86261.875153 | ... | 9770.415392 | 50706.546836 | 15684.547926 | 5035.738233 | 20720.286159 | 416856.663324 | 66769.633181 | 6.243207 | 248717.5706 | 1.676024 |
| 7104 | National Total | National Total | National Total | NaN | 2020 | 15164.001049 | 17463.024978 | 30135.710515 | 17584.10729 | 80346.843833 | ... | 8890.946812 | 49220.899615 | 14304.397217 | 4915.16683 | 19219.564047 | 376807.810496 | 67044.605507 | 5.620255 | 248717.5706 | 1.515003 |
| 7105 | National Total | National Total | National Total | NaN | 2021 | 17109.121337 | 20037.052043 | 29267.622916 | 17927.304689 | 84341.100985 | ... | 9247.865828 | 50711.980197 | 13618.050957 | 5195.505715 | 18813.556673 | 399046.140782 | 67026.307 | 5.953575 | 248717.5706 | 1.604415 |
7106 rows × 50 columns
# Display the shape of the data
data_shape = data_1_1_actual.shape
data_shape
(7106, 50)
# Get a summary of the data
data_summary = data_1_1_actual.describe()
data_summary
| 3 | Region/Country | Second Tier Authority | Local Authority | Local Authority Code | Calendar Year | Industry Electricity | Industry Gas | Large Industrial Installations | Industry 'Other' | Industry Total | ... | Agriculture Soils | Agriculture Total | Landfill | Waste Management 'Other' | Waste Management Total | Grand Total | Population ('000s, mid-year estimate) | Per Capita Emissions (tCO2e) | Area (km2) | Emissions per km2 (kt CO2e) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 7106 | 7106 | 7106 | 6358 | 7106 | 7106.00000 | 7106.0 | 7106.0 | 7106.0 | 7106.000000 | ... | 7106.0 | 7106.0 | 7106.0 | 7106.0 | 7106.0 | 7106.000000 | 7106.0 | 7106.0 | 7106.0 | 7106.0 |
| unique | 27 | 194 | 418 | 374 | 17 | 7106.00000 | 7002.0 | 6118.0 | 7073.0 | 7106.000000 | ... | 7073.0 | 7073.0 | 7090.0 | 7073.0 | 7090.0 | 7106.000000 | 7050.0 | 7073.0 | 417.0 | 7073.0 |
| top | South East | Scotland | Darlington | E06000005 | 2005 | 51.87311 | 0.0 | 0.0 | 0.0 | 209.694209 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 968.661604 | 0.0 | 0.0 | 0.0 | 0.0 |
| freq | 1190 | 544 | 17 | 17 | 418 | 1.00000 | 105.0 | 925.0 | 34.0 | 1.000000 | ... | 34.0 | 34.0 | 17.0 | 34.0 | 17.0 | 1.000000 | 34.0 | 34.0 | 34.0 | 34.0 |
4 rows × 50 columns
# Get a view of the datatypes
data_types = data_1_1_actual.dtypes
data_types
3
Region/Country object
Second Tier Authority object
Local Authority object
Local Authority Code object
Calendar Year object
Industry Electricity object
Industry Gas object
Large Industrial Installations object
Industry 'Other' object
Industry Total object
Commercial Electricity object
Commercial Gas object
Commercial 'Other' object
Commercial Total object
Public Sector Electricity object
Public Sector Gas object
Public Sector 'Other' object
Public Sector Total object
Domestic Electricity object
Domestic Gas object
Domestic 'Other' object
Domestic Total object
Road Transport (A roads) object
Road Transport (Motorways) object
Road Transport (Minor roads) object
Diesel Railways object
Transport 'Other' object
Transport Total object
Net Emissions: Forest land object
Net Emissions: Cropland object
Net Emissions: Grassland object
Net Emissions: Wetlands object
Net Emissions: Settlements object
Net Emissions: Harvested Wood Products object
Net Emissions: Indirect N2O object
LULUCF Net Emissions object
Agriculture Electricity object
Agriculture Gas object
Agriculture 'Other' object
Agriculture Livestock object
Agriculture Soils object
Agriculture Total object
Landfill object
Waste Management 'Other' object
Waste Management Total object
Grand Total object
Population ('000s, mid-year estimate) object
Per Capita Emissions (tCO2e) object
Area (km2) object
Emissions per km2 (kt CO2e) object
dtype: object
# Check for missing values
missing_values = data_1_1_actual.isnull().sum()
missing_values
3
Region/Country 0
Second Tier Authority 0
Local Authority 0
Local Authority Code 748
Calendar Year 0
Industry Electricity 0
Industry Gas 0
Large Industrial Installations 0
Industry 'Other' 0
Industry Total 0
Commercial Electricity 0
Commercial Gas 0
Commercial 'Other' 0
Commercial Total 0
Public Sector Electricity 0
Public Sector Gas 0
Public Sector 'Other' 0
Public Sector Total 0
Domestic Electricity 0
Domestic Gas 0
Domestic 'Other' 0
Domestic Total 0
Road Transport (A roads) 0
Road Transport (Motorways) 0
Road Transport (Minor roads) 0
Diesel Railways 0
Transport 'Other' 0
Transport Total 0
Net Emissions: Forest land 0
Net Emissions: Cropland 0
Net Emissions: Grassland 0
Net Emissions: Wetlands 0
Net Emissions: Settlements 0
Net Emissions: Harvested Wood Products 0
Net Emissions: Indirect N2O 0
LULUCF Net Emissions 0
Agriculture Electricity 0
Agriculture Gas 0
Agriculture 'Other' 0
Agriculture Livestock 0
Agriculture Soils 0
Agriculture Total 0
Landfill 0
Waste Management 'Other' 0
Waste Management Total 0
Grand Total 0
Population ('000s, mid-year estimate) 0
Per Capita Emissions (tCO2e) 0
Area (km2) 0
Emissions per km2 (kt CO2e) 0
dtype: int64
import matplotlib.pyplot as plt
# Group by 'Calendar Year' and sum the 'Grand Total' column to get total emissions for each year
yearly_emissions = data_1_1_actual.groupby('Calendar Year')['Grand Total'].sum()
# Plotting the trend over time
plt.figure(figsize=(12, 6))
yearly_emissions.plot(marker='o', linestyle='-', color='black')
plt.title('Total National Emissions Trend Over Time (2005-2021)')
plt.xlabel('Year')
plt.ylabel('Total Emissions (kt CO2e)')
plt.grid(True)
plt.show()
# Convert the 'Grand Total' column to a numeric type
#https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_numeric.html
data_1_1_actual['Grand Total'] = pd.to_numeric(data_1_1_actual['Grand Total'], errors='coerce')
# Group by 'Local Authority Region/Country' and sum the 'Grand Total' column to get total emissions for each region again
#https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html
regional_emissions = data_1_1_actual.groupby('Region/Country')['Grand Total'].sum()
# Sort the regional emissions in descending order and get the name of the first region
#https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sort_values.html
highest_emission_region = regional_emissions.sort_values(ascending=False).index[0]
highest_emission_value = regional_emissions.sort_values(ascending=False).iloc[0]
# Sort the regional emissions in ascending order and get the name of the first region
lowest_emission_region = regional_emissions.sort_values(ascending=True).index[0]
lowest_emission_value = regional_emissions.sort_values(ascending=True).iloc[0]
# Results
highest_emission_region, highest_emission_value, lowest_emission_region, lowest_emission_value
('National Total', 8763483.862937795, 'Unallocated', 70997.7272303627)
regional_emissions
Region/Country East Midlands 1.246579e+06 East Midlands Total 6.759135e+05 East of England 1.518514e+06 East of England Total 7.986982e+05 England Total 6.830838e+06 London 7.105470e+05 London Total 7.105470e+05 National Total 8.763484e+06 North East 4.513099e+05 North East Total 4.513099e+05 North West 1.419484e+06 North West Total 9.620017e+05 Northern Ireland 3.910081e+05 Northern Ireland Total 3.910081e+05 Scotland 8.602071e+05 Scotland Total 8.602071e+05 South East 1.713013e+06 South East Total 1.001996e+06 South West 1.073928e+06 South West Total 6.987242e+05 Unallocated 7.099773e+04 Wales 6.104329e+05 Wales Total 6.104329e+05 West Midlands 1.030287e+06 West Midlands Total 7.107770e+05 Yorkshire and the Humber 9.443414e+05 Yorkshire and the Humber Total 8.208705e+05 Name: Grand Total, dtype: float64
import matplotlib.pyplot as plt
plt.figure(figsize=(10, 6))
regional_emissions.plot(kind='bar', color='blue')
plt.title('Total Emissions by Region')
plt.xlabel('Region/Country')
plt.ylabel('Total Emissions')
plt.xticks(rotation=90)
plt.grid(axis='y')
plt.show()
# Drop 'National Total','England Total' and 'Unallocated' from the Series
regional_emissions = regional_emissions.drop(labels=['National Total','England Total', 'Unallocated'])
plt.figure(figsize=(10, 6))
regional_emissions.plot(kind='bar', color='blue')
plt.title('Total Emissions by Region')
plt.xlabel('Region/Country')
plt.ylabel('Total Emissions')
plt.xticks(rotation=90)
plt.grid(axis='y')
plt.show()
# Drop 'National Total' and 'Unallocated' from the Series
#regional_emissions = regional_emissions.drop(labels=['National Total','England Total', 'Unallocated'])
#drop labels that end with total
# Convert 'Grand Total' to numeric, coercing errors to NaN
#data_1_1_actual['Grand Total'] = pd.to_numeric(data_1_1_actual['Grand Total'], errors='coerce')
# Exclude rows where 'Region/Country' ends with 'total'
#data_1_1_filtered = data_1_1_actual[
# ~data_1_1_actual['Region/Country'].str.lower().str.endswith('total')
#]
# Group by 'Region/Country' and sum 'Grand Total' for remaining regions
#regional_emissions = data_1_1_filtered.groupby('Region/Country')['Grand Total'].sum()
# Drop 'Unallocated' from the Series
#regional_emissions = regional_emissions.drop(labels=['Unallocated'])
# Identify the highest and lowest emission regions
#highest_emission_region = regional_emissions.idxmax()
#lowest_emission_region = regional_emissions.idxmin()
# Get the highest and lowest emission values
#highest_emission_value = regional_emissions[highest_emission_region]
#lowest_emission_value = regional_emissions[lowest_emission_region]
# Results
#highest_emission_region, highest_emission_value, lowest_emission_region, lowest_emission_value
# Convert the 'Grand Total' column to a numeric type
data_1_1_actual['Grand Total'] = pd.to_numeric(data_1_1_actual['Grand Total'], errors='coerce')
# Filter out rows where 'Region/Country' ends with 'Total' or is 'Unallocated'
data_1_1_filtered = data_1_1_actual[
~data_1_1_actual['Region/Country'].str.endswith('Total') &
~data_1_1_actual['Region/Country'].str.contains('Unallocated', case=False)
]
# Group by 'Region/Country' and sum the 'Grand Total' column
regional_emissions = data_1_1_filtered.groupby('Region/Country')['Grand Total'].sum()
# Identify the regions with the highest and lowest cumulative emissions
highest_emission_region = regional_emissions.idxmax()
lowest_emission_region = regional_emissions.idxmin()
# Get the corresponding values for the highest and lowest emissions
highest_emission_value = regional_emissions[highest_emission_region]
lowest_emission_value = regional_emissions[lowest_emission_region]
# Output the results
(highest_emission_region, highest_emission_value, lowest_emission_region, lowest_emission_value)
('South East', 1713012.733123874, 'Northern Ireland', 391008.14327239833)
import matplotlib.pyplot as plt
# regional_emissions
# regional_emissions = data_1_1_filtered.groupby('Region/Country')['Grand Total'].sum()
plt.figure(figsize=(10, 6))
regional_emissions.plot(kind='bar', color='blue')
plt.title('Total Emissions by Region')
plt.xlabel('Region/Country')
plt.ylabel('Total Emissions')
plt.xticks(rotation=90)
plt.grid(axis='y')
plt.show()
regional_emissions
Region/Country East Midlands 1.246579e+06 East of England 1.518514e+06 London 7.105470e+05 North East 4.513099e+05 North West 1.419484e+06 Northern Ireland 3.910081e+05 Scotland 8.602071e+05 South East 1.713013e+06 South West 1.073928e+06 Wales 6.104329e+05 West Midlands 1.030287e+06 Yorkshire and the Humber 9.443414e+05 Name: Grand Total, dtype: float64
#import matplotlib.pyplot as plt
#import pandas as pd
#regions = ['East Midlands', 'East of England', 'London', 'North East', 'North West',
# 'Northern Ireland', 'Scotland', 'South East', 'South West', 'Wales',
# 'West Midlands', 'Yorkshire and the Humber']
#emissions = [1.246579e+06, 1.518514e+06, 7.105470e+05, 4.513099e+05, 1.419484e+06,
# 3.910081e+05, 8.602071e+05, 1.713013e+06, 1.073928e+06, 6.104329e+05,
# 1.030287e+06, 9.443414e+05]
#emissions_series = pd.Series(emissions, index=regions)
# Define colors, with red for 'West Midlands' and blue for others
#colors = ['red' if region == 'West Midlands' else 'blue' for region in regions]
# Create the bar plot
#plt.figure(figsize=(10, 6))
#emissions_series.plot(kind='bar', color=colors)
#plt.title('Total Emissions by Region')
#plt.xlabel('Region/Country')
#plt.ylabel('Grand Total Emissions')
#plt.xticks(rotation=90)
#plt.tight_layout()
#plt.grid(axis='y')
# plt.savefig('emissions_by_region.png')
# Display the plot
#plt.show()
import matplotlib.pyplot as plt
import pandas as pd
# Sort the series in ascending order for the plot
sorted_emissions = regional_emissions.sort_values()
# Define the colors for the bars, 'red' for 'West Midlands Total', 'skyblue' for the rest
colors = ['red' if region == 'West Midlands' else 'lightblue' for region in sorted_emissions.index]
# Create the bar plot
plt.figure(figsize=(10, 6))
sorted_emissions.plot(kind='bar', color=colors)
# Add titles and labels
plt.title('Total Emissions by Region (Sorted Ascending)', fontsize=14)
plt.xlabel('Region/Country', fontsize=12)
plt.ylabel('Total Emissions', fontsize=12)
plt.xticks(rotation=90)
plt.grid(axis='y')
# Show the plot
plt.tight_layout() # Adjust layout to fit labels
plt.show()
# Convert the 'Grand Total' column to a numeric type
data_1_1_actual['Grand Total'] = pd.to_numeric(data_1_1_actual['Grand Total'], errors='coerce')
# Group by 'Region/Country' and sum the 'Grand Total' column
regional_emissions_total = data_1_1_actual.groupby('Region/Country')['Grand Total'].sum()
# Drop 'National Total' and 'Unallocated' from the Series
regional_emissions_total = regional_emissions_total.drop(labels=['National Total', 'Unallocated'])
# Identify the regions with the highest and lowest cumulative emissions
highest_emission_region = regional_emissions_total.idxmax()
lowest_emission_region = regional_emissions_total.idxmin()
# Get the corresponding values for the highest and lowest emissions
highest_emission_value = regional_emissions_total[highest_emission_region]
lowest_emission_value = regional_emissions_total[lowest_emission_region]
# Output the results
(highest_emission_region, highest_emission_value, lowest_emission_region, lowest_emission_value)
('England Total', 6830837.969058414, 'Northern Ireland', 391008.14327239833)
# Keep only rows where 'Region/Country' ends with 'total' (case insensitive)
#data_1_1_filtered = data_1_1_actual[
# data_1_1_actual['Region/Country'].str.lower().str.endswith('total')
#]
# Convert 'Grand Total' to numeric, coercing errors to NaN
data_1_1_actual['Grand Total'] = pd.to_numeric(data_1_1_actual['Grand Total'], errors='coerce')
# Keep only rows where 'Region/Country' ends with 'total'
data_1_1_filtered = data_1_1_actual[
data_1_1_actual['Region/Country'].str.lower().str.endswith('total')
]
# Group by 'Region/Country' and sum 'Grand Total' for remaining regions
regional_emissions_total = data_1_1_filtered.groupby('Region/Country')['Grand Total'].sum()
# Drop 'Unallocated' from the Series
#regional_emissions_total = regional_emissions_total.drop(labels=['Unallocated'])
# Drop 'National Total' from the Series
regional_emissions_total = regional_emissions_total.drop(labels=['National Total'])
# Drop 'England Total' from the Series
regional_emissions_total = regional_emissions_total.drop(labels=['England Total'])
# Identify the highest and lowest emission regions
highest_emission_region = regional_emissions_total.idxmax()
lowest_emission_region = regional_emissions_total.idxmin()
# Get the highest and lowest emission values
highest_emission_value = regional_emissions_total[highest_emission_region]
lowest_emission_value = regional_emissions_total[lowest_emission_region]
# Results
highest_emission_region, highest_emission_value, lowest_emission_region, lowest_emission_value
('South East Total',
1001996.0574329,
'Northern Ireland Total',
391008.14327239833)
# Sort the regional emissions in descending order and get the name of the first region
highest_emission_region = regional_emissions.sort_values(ascending=False).index[0]
highest_emission_value = regional_emissions.sort_values(ascending=False).iloc[0]
# Sort the regional emissions in ascending order and get the name of the first region
lowest_emission_region = regional_emissions.sort_values(ascending=True).index[0]
lowest_emission_value = regional_emissions.sort_values(ascending=True).iloc[0]
# Results
highest_emission_region, highest_emission_value, lowest_emission_region, lowest_emission_value
('South East', 1713012.733123874, 'Northern Ireland', 391008.14327239833)
import matplotlib.pyplot as plt
plt.figure(figsize=(10, 6))
regional_emissions_total.plot(kind='bar', color='blue')
plt.title('Total Emissions by Region Total')
plt.xlabel('Region/Country')
plt.ylabel('Total Emissions')
plt.xticks(rotation=90)
plt.grid(axis='y')
plt.show()
regional_emissions_total
Region/Country East Midlands Total 6.759135e+05 East of England Total 7.986982e+05 London Total 7.105470e+05 North East Total 4.513099e+05 North West Total 9.620017e+05 Northern Ireland Total 3.910081e+05 Scotland Total 8.602071e+05 South East Total 1.001996e+06 South West Total 6.987242e+05 Wales Total 6.104329e+05 West Midlands Total 7.107770e+05 Yorkshire and the Humber Total 8.208705e+05 Name: Grand Total, dtype: float64
# Convert the dictionary to a pandas Series
emissions_series = pd.Series(regional_emissions_total)
# Define the colors for the bars, 'red' for 'West Midlands Total', 'skyblue' for the rest
colors = ['red' if region == 'West Midlands Total' else 'blue' for region in emissions_series.index]
# Create the bar plot
plt.figure(figsize=(10, 6))
emissions_series.plot(kind='bar', color=colors)
# Add titles and labels
plt.title('Total Emissions by Region Total', fontsize=14)
plt.xlabel('Region/Country', fontsize=12)
plt.ylabel('Total Emissions', fontsize=12)
plt.xticks(rotation=90)
plt.grid(axis='y')
# plt.savefig('emissions_by_region_total.png')
# Show the plot
plt.tight_layout() # Adjust layout to fit labels
plt.show()
import matplotlib.pyplot as plt
import pandas as pd
# Sort the series in ascending order for the plot
sorted_emissions = regional_emissions_total.sort_values()
# Define the colors for the bars, 'red' for 'West Midlands Total', 'skyblue' for the rest
colors = ['red' if region == 'West Midlands Total' else 'lightblue' for region in sorted_emissions.index]
# Create the bar plot
plt.figure(figsize=(10, 6))
sorted_emissions.plot(kind='bar', color=colors)
# Add titles and labels
plt.title('Total Emissions by Region Total (Sorted Ascending)', fontsize=14)
plt.xlabel('Region/Country', fontsize=12)
plt.ylabel('Total Emissions', fontsize=12)
plt.xticks(rotation=90) # Rotate the region names for better readability
plt.grid(axis='y')
# Show the plot
plt.tight_layout() # Adjust layout to fit labels
plt.show()
WestMidlands
# Filter the data for the "West Midlands" Region/Country
west_midlands_data = data_1_1_actual[data_1_1_actual['Region/Country'] == 'West Midlands']
west_midlands_data
| 3 | Region/Country | Second Tier Authority | Local Authority | Local Authority Code | Calendar Year | Industry Electricity | Industry Gas | Large Industrial Installations | Industry 'Other' | Industry Total | ... | Agriculture Soils | Agriculture Total | Landfill | Waste Management 'Other' | Waste Management Total | Grand Total | Population ('000s, mid-year estimate) | Per Capita Emissions (tCO2e) | Area (km2) | Emissions per km2 (kt CO2e) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2040 | West Midlands | Birmingham | Birmingham | E08000025 | 2005 | 526.551509 | 366.037594 | 61.953269 | 261.620286 | 1216.162658 | ... | 2.318744 | 15.128539 | 11.122149 | 43.552592 | 54.674741 | 7059.334372 | 1014.65 | 6.957408 | 267.7912 | 26.361338 |
| 2041 | West Midlands | Birmingham | Birmingham | E08000025 | 2006 | 537.58464 | 325.085629 | 82.275024 | 254.186129 | 1199.131422 | ... | 2.519812 | 15.762343 | 10.324532 | 48.716274 | 59.040806 | 6952.112034 | 1020.843 | 6.810168 | 267.7912 | 25.960943 |
| 2042 | West Midlands | Birmingham | Birmingham | E08000025 | 2007 | 510.882763 | 311.246683 | 94.635887 | 259.93259 | 1176.697923 | ... | 2.361162 | 15.055982 | 36.275553 | 50.047079 | 86.322633 | 6800.522322 | 1029.021 | 6.60873 | 267.7912 | 25.394869 |
| 2043 | West Midlands | Birmingham | Birmingham | E08000025 | 2008 | 529.503396 | 305.588896 | 133.077044 | 207.353832 | 1175.523168 | ... | 2.487657 | 14.784999 | 30.037779 | 48.617361 | 78.655141 | 6744.677640 | 1038.98 | 6.491634 | 267.7912 | 25.18633 |
| 2044 | West Midlands | Birmingham | Birmingham | E08000025 | 2009 | 429.653608 | 281.346962 | 44.531193 | 182.413125 | 937.944889 | ... | 2.289698 | 13.508004 | 159.413026 | 47.379221 | 206.792247 | 6099.737055 | 1050.072 | 5.808875 | 267.7912 | 22.777959 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2596 | West Midlands | Worcestershire Total | Worcestershire Total | NaN | 2017 | 152.467092 | 116.944652 | 19.834311 | 217.425383 | 506.671438 | ... | 68.894191 | 386.871084 | 216.880509 | 46.866939 | 263.747448 | 3987.637970 | 596.365709 | 6.686565 | 1740.5141 | 2.291069 |
| 2597 | West Midlands | Worcestershire Total | Worcestershire Total | NaN | 2018 | 172.838333 | 131.012733 | 21.624796 | 232.232283 | 557.708145 | ... | 69.275392 | 405.306105 | 194.41849 | 46.092127 | 240.510617 | 3895.210521 | 600.345089 | 6.488286 | 1740.5141 | 2.237966 |
| 2598 | West Midlands | Worcestershire Total | Worcestershire Total | NaN | 2019 | 146.809055 | 137.062654 | 19.014284 | 197.144455 | 500.030448 | ... | 72.29286 | 383.175445 | 194.909071 | 47.087417 | 241.996488 | 3745.219387 | 604.075264 | 6.199922 | 1740.5141 | 2.151789 |
| 2599 | West Midlands | Worcestershire Total | Worcestershire Total | NaN | 2020 | 121.875589 | 115.848439 | 15.014089 | 197.552499 | 450.290616 | ... | 60.056299 | 359.954182 | 193.621385 | 44.625131 | 238.246516 | 3293.630698 | 606.317326 | 5.43219 | 1740.5141 | 1.892332 |
| 2600 | West Midlands | Worcestershire Total | Worcestershire Total | NaN | 2021 | 141.541027 | 169.279094 | 18.323263 | 205.906984 | 535.050368 | ... | 68.156731 | 373.633695 | 201.635026 | 46.49455 | 248.129576 | 3631.573462 | 604.947 | 6.003127 | 1740.5141 | 2.086495 |
561 rows × 50 columns
# Filter out rows where "Second Tier Authority" ends with "Total"
west_midlands_data_filtered = west_midlands_data[~west_midlands_data['Second Tier Authority'].str.endswith("Total")]
west_midlands_data_filtered
| 3 | Region/Country | Second Tier Authority | Local Authority | Local Authority Code | Calendar Year | Industry Electricity | Industry Gas | Large Industrial Installations | Industry 'Other' | Industry Total | ... | Agriculture Soils | Agriculture Total | Landfill | Waste Management 'Other' | Waste Management Total | Grand Total | Population ('000s, mid-year estimate) | Per Capita Emissions (tCO2e) | Area (km2) | Emissions per km2 (kt CO2e) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2040 | West Midlands | Birmingham | Birmingham | E08000025 | 2005 | 526.551509 | 366.037594 | 61.953269 | 261.620286 | 1216.162658 | ... | 2.318744 | 15.128539 | 11.122149 | 43.552592 | 54.674741 | 7059.334372 | 1014.65 | 6.957408 | 267.7912 | 26.361338 |
| 2041 | West Midlands | Birmingham | Birmingham | E08000025 | 2006 | 537.58464 | 325.085629 | 82.275024 | 254.186129 | 1199.131422 | ... | 2.519812 | 15.762343 | 10.324532 | 48.716274 | 59.040806 | 6952.112034 | 1020.843 | 6.810168 | 267.7912 | 25.960943 |
| 2042 | West Midlands | Birmingham | Birmingham | E08000025 | 2007 | 510.882763 | 311.246683 | 94.635887 | 259.93259 | 1176.697923 | ... | 2.361162 | 15.055982 | 36.275553 | 50.047079 | 86.322633 | 6800.522322 | 1029.021 | 6.60873 | 267.7912 | 25.394869 |
| 2043 | West Midlands | Birmingham | Birmingham | E08000025 | 2008 | 529.503396 | 305.588896 | 133.077044 | 207.353832 | 1175.523168 | ... | 2.487657 | 14.784999 | 30.037779 | 48.617361 | 78.655141 | 6744.677640 | 1038.98 | 6.491634 | 267.7912 | 25.18633 |
| 2044 | West Midlands | Birmingham | Birmingham | E08000025 | 2009 | 429.653608 | 281.346962 | 44.531193 | 182.413125 | 937.944889 | ... | 2.289698 | 13.508004 | 159.413026 | 47.379221 | 206.792247 | 6099.737055 | 1050.072 | 5.808875 | 267.7912 | 22.777959 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2579 | West Midlands | Worcestershire | Wyre Forest | E07000239 | 2017 | 21.596821 | 10.207297 | 1.131147 | 48.643531 | 81.578795 | ... | 6.365818 | 33.125984 | 35.766842 | 5.064742 | 40.831584 | 499.390383 | 101.886605 | 4.901433 | 195.4038 | 2.555684 |
| 2580 | West Midlands | Worcestershire | Wyre Forest | E07000239 | 2018 | 26.603644 | 13.005476 | 1.125587 | 59.088112 | 99.82282 | ... | 6.132734 | 32.343872 | 32.129977 | 5.063163 | 37.19314 | 489.710354 | 102.301938 | 4.786912 | 195.4038 | 2.506145 |
| 2581 | West Midlands | Worcestershire | Wyre Forest | E07000239 | 2019 | 23.235559 | 17.494639 | 1.147332 | 42.889724 | 84.767255 | ... | 6.015508 | 31.772339 | 31.502365 | 4.954569 | 36.456934 | 458.909804 | 102.485512 | 4.477802 | 195.4038 | 2.34852 |
| 2582 | West Midlands | Worcestershire | Wyre Forest | E07000239 | 2020 | 17.900403 | 11.514084 | 1.128091 | 44.760307 | 75.302885 | ... | 5.358702 | 30.189129 | 31.393792 | 4.728242 | 36.122034 | 408.559610 | 102.162048 | 3.999133 | 195.4038 | 2.090848 |
| 2583 | West Midlands | Worcestershire | Wyre Forest | E07000239 | 2021 | 21.503682 | 14.971111 | 1.077544 | 43.31466 | 80.866997 | ... | 5.185539 | 30.605628 | 32.829161 | 5.01005 | 37.839211 | 436.541231 | 101.786 | 4.288814 | 195.4038 | 2.234047 |
510 rows × 50 columns
# Pivot table creation
pivot_table = pd.pivot_table(
west_midlands_data_filtered,
values=[
'Commercial Total', 'Waste Management Total', 'Agriculture Total',
'Transport Total', 'Domestic Total', 'Public Sector Total',
'Industry Total', 'LULUCF Net Emissions','Emissions per km2 (kt CO2e)'
],
index=['Calendar Year'],
aggfunc=sum
)
# Transpose the pivot table to have categories as rows and years as columns
pivot_table_transposed = pivot_table.T
# Print the transposed pivot table
# print(pivot_table_transposed)
# pivot_table_transposed.to_csv('features_by_years.csv')
pivot_table
| 3 | Agriculture Total | Commercial Total | Domestic Total | Emissions per km2 (kt CO2e) | Industry Total | LULUCF Net Emissions | Public Sector Total | Transport Total | Waste Management Total |
|---|---|---|---|---|---|---|---|---|---|
| Calendar Year | |||||||||
| 2005 | 4028.975542 | 5809.331253 | 13449.536094 | 289.942804 | 10110.465071 | -317.593628 | 2157.052152 | 13501.32343 | 993.132504 |
| 2006 | 3977.611559 | 6076.632436 | 13583.317568 | 288.863353 | 10197.714268 | -326.862871 | 2068.2373 | 13305.491246 | 982.898678 |
| 2007 | 3905.02243 | 5783.889127 | 13081.835599 | 286.449175 | 10138.235254 | -364.168107 | 1940.546956 | 13427.290157 | 1704.174232 |
| 2008 | 3896.858302 | 5577.45716 | 13033.414716 | 274.077573 | 9265.801065 | -409.084036 | 1836.585244 | 12933.307202 | 1555.034766 |
| 2009 | 3807.240767 | 4712.011264 | 11835.167963 | 252.942548 | 8063.486578 | -410.381053 | 1569.812241 | 12494.989414 | 2215.670625 |
| 2010 | 3840.081371 | 5001.267074 | 12685.403118 | 265.708579 | 8605.356043 | -416.001919 | 1702.472983 | 12409.032802 | 2569.207488 |
| 2011 | 3761.345066 | 4672.226863 | 11112.642004 | 244.269938 | 8053.840463 | -437.639005 | 1543.437224 | 12275.012297 | 2099.16877 |
| 2012 | 3809.765776 | 4997.596854 | 11868.951211 | 250.674974 | 7934.255575 | -405.366106 | 1655.926145 | 12166.688081 | 2004.062529 |
| 2013 | 3722.949528 | 4772.413248 | 11531.527193 | 245.219017 | 7930.368103 | -444.5167 | 1632.970841 | 12070.626129 | 1965.555586 |
| 2014 | 3809.989355 | 4075.739643 | 9798.620535 | 226.299273 | 7683.166895 | -443.113774 | 1407.210297 | 12321.712167 | 1760.41934 |
| 2015 | 3805.79878 | 3700.093353 | 9562.443689 | 221.343256 | 7258.445476 | -467.135531 | 1343.820223 | 12586.051221 | 1781.099069 |
| 2016 | 3692.000037 | 3288.671641 | 9036.525528 | 211.633827 | 6871.729735 | -419.905408 | 1185.686926 | 12802.317734 | 1976.630326 |
| 2017 | 3766.397313 | 2555.056076 | 8585.065579 | 205.073094 | 6795.989583 | -447.164862 | 1247.662167 | 12775.515791 | 2203.734087 |
| 2018 | 3806.417076 | 1707.805299 | 8419.354441 | 199.917735 | 7338.841582 | -432.569368 | 1400.427797 | 12562.288533 | 2002.438545 |
| 2019 | 3752.726686 | 1378.760231 | 8136.337663 | 189.296645 | 6823.598748 | -457.906092 | 1289.674005 | 12222.766602 | 1934.145194 |
| 2020 | 3573.008128 | 1094.025601 | 7960.604044 | 169.355461 | 6329.377088 | -461.502795 | 1210.227228 | 9941.294068 | 1725.606782 |
| 2021 | 3639.604571 | 1289.487328 | 8254.300582 | 181.918944 | 7025.526089 | -458.060889 | 1333.317247 | 11009.606016 | 1643.074363 |
pivot_table_transposed
| Calendar Year | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | 2021 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3 | |||||||||||||||||
| Agriculture Total | 4028.975542 | 3977.611559 | 3905.022430 | 3896.858302 | 3807.240767 | 3840.081371 | 3761.345066 | 3809.765776 | 3722.949528 | 3809.989355 | 3805.798780 | 3692.000037 | 3766.397313 | 3806.417076 | 3752.726686 | 3573.008128 | 3639.604571 |
| Commercial Total | 5809.331253 | 6076.632436 | 5783.889127 | 5577.457160 | 4712.011264 | 5001.267074 | 4672.226863 | 4997.596854 | 4772.413248 | 4075.739643 | 3700.093353 | 3288.671641 | 2555.056076 | 1707.805299 | 1378.760231 | 1094.025601 | 1289.487328 |
| Domestic Total | 13449.536094 | 13583.317568 | 13081.835599 | 13033.414716 | 11835.167963 | 12685.403118 | 11112.642004 | 11868.951211 | 11531.527193 | 9798.620535 | 9562.443689 | 9036.525528 | 8585.065579 | 8419.354441 | 8136.337663 | 7960.604044 | 8254.300582 |
| Emissions per km2 (kt CO2e) | 289.942804 | 288.863353 | 286.449175 | 274.077573 | 252.942548 | 265.708579 | 244.269938 | 250.674974 | 245.219017 | 226.299273 | 221.343256 | 211.633827 | 205.073094 | 199.917735 | 189.296645 | 169.355461 | 181.918944 |
| Industry Total | 10110.465071 | 10197.714268 | 10138.235254 | 9265.801065 | 8063.486578 | 8605.356043 | 8053.840463 | 7934.255575 | 7930.368103 | 7683.166895 | 7258.445476 | 6871.729735 | 6795.989583 | 7338.841582 | 6823.598748 | 6329.377088 | 7025.526089 |
| LULUCF Net Emissions | -317.593628 | -326.862871 | -364.168107 | -409.084036 | -410.381053 | -416.001919 | -437.639005 | -405.366106 | -444.516700 | -443.113774 | -467.135531 | -419.905408 | -447.164862 | -432.569368 | -457.906092 | -461.502795 | -458.060889 |
| Public Sector Total | 2157.052152 | 2068.237300 | 1940.546956 | 1836.585244 | 1569.812241 | 1702.472983 | 1543.437224 | 1655.926145 | 1632.970841 | 1407.210297 | 1343.820223 | 1185.686926 | 1247.662167 | 1400.427797 | 1289.674005 | 1210.227228 | 1333.317247 |
| Transport Total | 13501.323430 | 13305.491246 | 13427.290157 | 12933.307202 | 12494.989414 | 12409.032802 | 12275.012297 | 12166.688081 | 12070.626129 | 12321.712167 | 12586.051221 | 12802.317734 | 12775.515791 | 12562.288533 | 12222.766602 | 9941.294068 | 11009.606016 |
| Waste Management Total | 993.132504 | 982.898678 | 1704.174232 | 1555.034766 | 2215.670625 | 2569.207488 | 2099.168770 | 2004.062529 | 1965.555586 | 1760.419340 | 1781.099069 | 1976.630326 | 2203.734087 | 2002.438545 | 1934.145194 | 1725.606782 | 1643.074363 |
pivot_table_transposed.corr()
| Calendar Year | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | 2021 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Calendar Year | |||||||||||||||||
| 2005 | 1.000000 | 0.999707 | 0.998732 | 0.998613 | 0.991479 | 0.991547 | 0.991912 | 0.993618 | 0.993859 | 0.983923 | 0.975687 | 0.961511 | 0.946135 | 0.939583 | 0.932480 | 0.947603 | 0.948853 |
| 2006 | 0.999707 | 1.000000 | 0.998403 | 0.998334 | 0.989789 | 0.990788 | 0.990035 | 0.992557 | 0.992489 | 0.980433 | 0.971191 | 0.955886 | 0.939172 | 0.931815 | 0.924216 | 0.940674 | 0.941725 |
| 2007 | 0.998732 | 0.998403 | 1.000000 | 0.998963 | 0.994393 | 0.994761 | 0.995211 | 0.995505 | 0.996049 | 0.987501 | 0.979535 | 0.966698 | 0.952439 | 0.945399 | 0.938189 | 0.951952 | 0.953161 |
| 2008 | 0.998613 | 0.998334 | 0.998963 | 1.000000 | 0.996148 | 0.996680 | 0.995657 | 0.997844 | 0.997708 | 0.986340 | 0.978960 | 0.965865 | 0.950809 | 0.941955 | 0.935568 | 0.950719 | 0.950351 |
| 2009 | 0.991479 | 0.989789 | 0.994393 | 0.996148 | 1.000000 | 0.998480 | 0.999338 | 0.999391 | 0.999718 | 0.993185 | 0.989331 | 0.980747 | 0.969569 | 0.959834 | 0.955375 | 0.966119 | 0.964785 |
| 2010 | 0.991547 | 0.990788 | 0.994761 | 0.996680 | 0.998480 | 1.000000 | 0.996746 | 0.998603 | 0.998446 | 0.986271 | 0.980466 | 0.969608 | 0.957166 | 0.947864 | 0.942720 | 0.958749 | 0.955883 |
| 2011 | 0.991912 | 0.990035 | 0.995211 | 0.995657 | 0.999338 | 0.996746 | 1.000000 | 0.998421 | 0.999245 | 0.996167 | 0.992308 | 0.984251 | 0.973556 | 0.964359 | 0.959465 | 0.967777 | 0.967906 |
| 2012 | 0.993618 | 0.992557 | 0.995505 | 0.997844 | 0.999391 | 0.998603 | 0.998421 | 1.000000 | 0.999808 | 0.990443 | 0.985482 | 0.975169 | 0.961938 | 0.950899 | 0.945959 | 0.958274 | 0.956731 |
| 2013 | 0.993859 | 0.992489 | 0.996049 | 0.997708 | 0.999718 | 0.998446 | 0.999245 | 0.999808 | 1.000000 | 0.992578 | 0.987924 | 0.978229 | 0.965972 | 0.956038 | 0.951104 | 0.962484 | 0.961484 |
| 2014 | 0.983923 | 0.980433 | 0.987501 | 0.986340 | 0.993185 | 0.986271 | 0.996167 | 0.990443 | 0.992578 | 1.000000 | 0.998836 | 0.994492 | 0.987349 | 0.980165 | 0.976078 | 0.977006 | 0.979746 |
| 2015 | 0.975687 | 0.971191 | 0.979535 | 0.978960 | 0.989331 | 0.980466 | 0.992308 | 0.985482 | 0.987924 | 0.998836 | 1.000000 | 0.998237 | 0.993195 | 0.985935 | 0.983098 | 0.980691 | 0.983483 |
| 2016 | 0.961511 | 0.955886 | 0.966698 | 0.965865 | 0.980747 | 0.969608 | 0.984251 | 0.975169 | 0.978229 | 0.994492 | 0.998237 | 1.000000 | 0.997802 | 0.990656 | 0.988950 | 0.981532 | 0.984991 |
| 2017 | 0.946135 | 0.939172 | 0.952439 | 0.950809 | 0.969569 | 0.957166 | 0.973556 | 0.961938 | 0.965972 | 0.987349 | 0.993195 | 0.997802 | 1.000000 | 0.996425 | 0.995866 | 0.986805 | 0.990171 |
| 2018 | 0.939583 | 0.931815 | 0.945399 | 0.941955 | 0.959834 | 0.947864 | 0.964359 | 0.950899 | 0.956038 | 0.980165 | 0.985935 | 0.990656 | 0.996425 | 1.000000 | 0.999545 | 0.993473 | 0.997028 |
| 2019 | 0.932480 | 0.924216 | 0.938189 | 0.935568 | 0.955375 | 0.942720 | 0.959465 | 0.945959 | 0.951104 | 0.976078 | 0.983098 | 0.988950 | 0.995866 | 0.999545 | 1.000000 | 0.993427 | 0.996302 |
| 2020 | 0.947603 | 0.940674 | 0.951952 | 0.950719 | 0.966119 | 0.958749 | 0.967777 | 0.958274 | 0.962484 | 0.977006 | 0.980691 | 0.981532 | 0.986805 | 0.993473 | 0.993427 | 1.000000 | 0.998725 |
| 2021 | 0.948853 | 0.941725 | 0.953161 | 0.950351 | 0.964785 | 0.955883 | 0.967906 | 0.956731 | 0.961484 | 0.979746 | 0.983483 | 0.984991 | 0.990171 | 0.997028 | 0.996302 | 0.998725 | 1.000000 |
pivot_table.corr()
| 3 | Agriculture Total | Commercial Total | Domestic Total | Emissions per km2 (kt CO2e) | Industry Total | LULUCF Net Emissions | Public Sector Total | Transport Total | Waste Management Total |
|---|---|---|---|---|---|---|---|---|---|
| 3 | |||||||||
| Agriculture Total | 1.000000 | 0.775494 | 0.793451 | 0.860274 | 0.880980 | 0.833005 | 0.861581 | 0.852000 | -0.477497 |
| Commercial Total | 0.775494 | 1.000000 | 0.959494 | 0.974504 | 0.866052 | 0.729578 | 0.847188 | 0.680485 | -0.267074 |
| Domestic Total | 0.793451 | 0.959494 | 1.000000 | 0.983375 | 0.928145 | 0.796254 | 0.934487 | 0.608518 | -0.313520 |
| Emissions per km2 (kt CO2e) | 0.860274 | 0.974504 | 0.983375 | 1.000000 | 0.942496 | 0.818817 | 0.925586 | 0.732963 | -0.321745 |
| Industry Total | 0.880980 | 0.866052 | 0.928145 | 0.942496 | 1.000000 | 0.889868 | 0.975990 | 0.684668 | -0.527468 |
| LULUCF Net Emissions | 0.833005 | 0.729578 | 0.796254 | 0.818817 | 0.889868 | 1.000000 | 0.887573 | 0.665688 | -0.608115 |
| Public Sector Total | 0.861581 | 0.847188 | 0.934487 | 0.925586 | 0.975990 | 0.887573 | 1.000000 | 0.604193 | -0.542619 |
| Transport Total | 0.852000 | 0.680485 | 0.608518 | 0.732963 | 0.684668 | 0.665688 | 0.604193 | 1.000000 | -0.237519 |
| Waste Management Total | -0.477497 | -0.267074 | -0.313520 | -0.321745 | -0.527468 | -0.608115 | -0.542619 | -0.237519 | 1.000000 |
# Generate the correlation matrix
corr = pivot_table.corr()
# Set up the matplotlib figure
plt.figure(figsize=(12, 10))
# Draw the heatmap with the mask and correct aspect ratio
sns.heatmap(corr, annot=True, fmt=".2f", cmap='coolwarm', square=True, linewidths=.5, cbar_kws={"shrink": .5})
# Add title
plt.title('West Midland Correlation Matrix Heatmap')
# Show the plot
plt.show()
Values close to 1 indicate a strong positive correlation, meaning that as one feature increases, the other tends to increase as well.
Values close to -1 indicate a strong negative correlation, meaning that as one feature increases, the other tends to decrease.
Values close to 0 indicate little to no linear relationship between the features.
# List of columns to keep
columns_to_keep = ['Region/Country', 'Second Tier Authority', 'Local Authority', 'Local Authority Code', 'Calendar Year', 'LULUCF Net Emissions']
# Add columns that end with "Total"
columns_to_keep.extend([col for col in west_midlands_data_filtered.columns if col.endswith("Total")])
west_midlands_filtered_columns = west_midlands_data_filtered[columns_to_keep]
west_midlands_filtered_columns.to_csv('west_midlands_filtered_columns.csv')
west_midlands_filtered_columns
| 3 | Region/Country | Second Tier Authority | Local Authority | Local Authority Code | Calendar Year | LULUCF Net Emissions | Industry Total | Commercial Total | Public Sector Total | Domestic Total | Transport Total | Agriculture Total | Waste Management Total | Grand Total |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2040 | West Midlands | Birmingham | Birmingham | E08000025 | 2005 | 13.873543 | 1216.162658 | 1170.441473 | 558.70517 | 2359.376951 | 1670.971299 | 15.128539 | 54.674741 | 7059.334372 |
| 2041 | West Midlands | Birmingham | Birmingham | E08000025 | 2006 | 13.855886 | 1199.131422 | 1168.499559 | 519.502282 | 2365.574895 | 1610.744841 | 15.762343 | 59.040806 | 6952.112034 |
| 2042 | West Midlands | Birmingham | Birmingham | E08000025 | 2007 | 13.477209 | 1176.697923 | 1111.410745 | 492.80575 | 2283.42429 | 1621.32779 | 15.055982 | 86.322633 | 6800.522322 |
| 2043 | West Midlands | Birmingham | Birmingham | E08000025 | 2008 | 12.998567 | 1175.523168 | 1143.259246 | 489.438236 | 2284.490311 | 1545.527971 | 14.784999 | 78.655141 | 6744.677640 |
| 2044 | West Midlands | Birmingham | Birmingham | E08000025 | 2009 | 12.151837 | 937.944889 | 945.363805 | 423.187511 | 2062.886425 | 1497.902337 | 13.508004 | 206.792247 | 6099.737055 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2579 | West Midlands | Worcestershire | Wyre Forest | E07000239 | 2017 | -22.99366 | 81.578795 | 39.204973 | 18.44841 | 154.11273 | 155.081565 | 33.125984 | 40.831584 | 499.390383 |
| 2580 | West Midlands | Worcestershire | Wyre Forest | E07000239 | 2018 | -22.677647 | 99.82282 | 23.783474 | 18.025695 | 149.76546 | 151.453539 | 32.343872 | 37.19314 | 489.710354 |
| 2581 | West Midlands | Worcestershire | Wyre Forest | E07000239 | 2019 | -22.823968 | 84.767255 | 22.492546 | 15.106118 | 144.241608 | 146.896973 | 31.772339 | 36.456934 | 458.909804 |
| 2582 | West Midlands | Worcestershire | Wyre Forest | E07000239 | 2020 | -23.150931 | 75.302885 | 16.067331 | 14.061786 | 141.095909 | 118.871468 | 30.189129 | 36.122034 | 408.559610 |
| 2583 | West Midlands | Worcestershire | Wyre Forest | E07000239 | 2021 | -22.956472 | 80.866997 | 17.888049 | 14.58203 | 144.139487 | 133.576301 | 30.605628 | 37.839211 | 436.541231 |
510 rows × 14 columns
# Create the pivot table of west_midlands local authorities emission
west_midlands_local_authority_emissions = pd.pivot_table(west_midlands_filtered_columns, values='Grand Total', index=['Local Authority'], columns=['Calendar Year'])
west_midlands_local_authority_emissions
| Calendar Year | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | 2021 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Local Authority | |||||||||||||||||
| Birmingham | 7059.334372 | 6952.112034 | 6800.522322 | 6744.677640 | 6099.737055 | 6412.804284 | 5805.907414 | 6094.847957 | 5928.649539 | 5283.528982 | 5187.812112 | 5018.990855 | 4952.980766 | 4694.297182 | 4500.378294 | 4177.186378 | 4480.654084 |
| Bromsgrove | 956.642291 | 966.027098 | 1014.325992 | 966.508457 | 915.239428 | 943.043254 | 865.875640 | 885.439318 | 875.957613 | 847.917226 | 839.358149 | 828.019311 | 799.333806 | 765.570975 | 756.273104 | 655.139864 | 711.760256 |
| Cannock Chase | 588.979929 | 597.874966 | 584.212829 | 554.965633 | 534.431424 | 565.871617 | 516.643905 | 523.651492 | 521.293200 | 463.806795 | 448.224377 | 432.415392 | 438.056787 | 428.182712 | 400.378294 | 360.878619 | 364.042820 |
| Coventry | 2285.257172 | 2244.459643 | 2109.501041 | 2028.053404 | 1835.162506 | 1962.611578 | 1766.002096 | 1881.254900 | 1815.756821 | 1664.671871 | 1647.571233 | 1574.337656 | 1548.946359 | 1496.667926 | 1428.701726 | 1280.312165 | 1344.000349 |
| Dudley | 1983.590348 | 1972.154700 | 1888.505634 | 1818.345968 | 1657.763334 | 1778.665829 | 1603.028674 | 1645.915361 | 1619.695684 | 1452.515600 | 1412.477913 | 1356.141083 | 1347.120418 | 1295.213550 | 1217.885198 | 1079.756656 | 1163.033149 |
| East Staffordshire | 1287.270750 | 1298.775799 | 1260.546128 | 1245.328814 | 1170.588639 | 1228.732489 | 1160.773439 | 1165.877455 | 1143.246946 | 1050.973296 | 1014.291935 | 988.598271 | 981.240089 | 966.664142 | 903.141998 | 810.544121 | 853.521292 |
| Herefordshire, County of | 2091.684474 | 2090.207781 | 2038.853224 | 1975.986634 | 1872.500149 | 1994.484316 | 1849.438619 | 1892.331768 | 1847.560317 | 1777.587181 | 1696.464807 | 1660.690961 | 1618.084774 | 1598.639996 | 1527.132610 | 1364.626064 | 1472.865799 |
| Lichfield | 1004.513597 | 1022.789515 | 1013.242837 | 993.374843 | 955.231477 | 981.016859 | 931.692962 | 938.137617 | 935.001562 | 875.810423 | 863.532062 | 850.456232 | 851.408592 | 837.842706 | 813.911260 | 691.404937 | 754.890754 |
| Malvern Hills | 772.773159 | 799.621230 | 848.442583 | 797.397569 | 757.938462 | 782.645008 | 723.285139 | 726.094378 | 717.922272 | 695.654404 | 688.580938 | 676.822279 | 655.588099 | 642.098370 | 619.590085 | 538.434311 | 601.444378 |
| Newcastle-under-Lyme | 1121.822896 | 1113.654899 | 1107.267011 | 1087.895669 | 1042.573930 | 1083.119199 | 1024.749999 | 1024.635624 | 1021.924809 | 939.404697 | 969.297348 | 944.169090 | 929.997916 | 911.784056 | 875.110449 | 784.361909 | 844.190037 |
| North Warwickshire | 1147.478903 | 1203.041332 | 1226.193958 | 1194.492043 | 1121.940224 | 1191.290419 | 1085.180037 | 1110.955808 | 1077.930006 | 1065.117153 | 1060.260000 | 1029.729619 | 1025.527236 | 1001.285910 | 961.051901 | 823.867578 | 919.630410 |
| Nuneaton and Bedworth | 813.097075 | 815.719471 | 861.239859 | 828.813779 | 772.443801 | 765.017013 | 677.858160 | 704.027486 | 684.424974 | 638.996823 | 634.593829 | 606.356718 | 597.691045 | 593.546030 | 563.796985 | 505.938444 | 540.854821 |
| Redditch | 585.906245 | 591.548822 | 632.697674 | 593.920918 | 533.689582 | 577.553981 | 522.724477 | 528.282173 | 500.489462 | 468.059313 | 444.165840 | 421.000712 | 406.630410 | 394.836203 | 383.634496 | 332.680616 | 357.787785 |
| Rugby | 2330.213067 | 2373.288078 | 2651.893434 | 2411.372020 | 2278.320929 | 2283.729045 | 2334.076177 | 2124.141199 | 2157.994109 | 2133.777044 | 1993.474564 | 2109.935757 | 2000.143439 | 2041.847091 | 1988.209932 | 1853.518649 | 2026.454350 |
| Sandwell | 2289.664199 | 2311.288395 | 2271.410144 | 2149.609456 | 1915.683984 | 2035.995284 | 1862.670040 | 1942.089900 | 1917.089725 | 1748.667558 | 1785.651958 | 1577.895063 | 1587.577139 | 1534.233142 | 1453.591967 | 1315.105902 | 1401.088684 |
| Shropshire | 3788.039158 | 3772.811560 | 3701.754274 | 3605.369762 | 3330.136985 | 3537.762726 | 3277.426402 | 3379.256702 | 3297.401331 | 3129.205145 | 3092.862487 | 3029.587506 | 2980.901584 | 2986.369707 | 2882.366244 | 2645.075805 | 2773.370585 |
| Solihull | 1735.803608 | 1801.630633 | 1757.701820 | 1667.849693 | 1565.457992 | 1683.199281 | 1588.399097 | 1667.768397 | 1627.491397 | 1492.473058 | 1518.294020 | 1475.326158 | 1427.841084 | 1410.483026 | 1327.805102 | 1145.680549 | 1226.181599 |
| South Staffordshire | 1142.941882 | 1174.670664 | 1167.283617 | 1104.231784 | 1101.010671 | 1130.681369 | 1083.159234 | 1088.067517 | 1082.128759 | 1028.457756 | 1000.357801 | 1002.784932 | 1034.087269 | 1000.812994 | 973.519842 | 870.868301 | 954.939263 |
| Stafford | 1593.609395 | 1616.541260 | 1569.472252 | 1566.564484 | 1533.426323 | 1571.727852 | 1488.504814 | 1541.704997 | 1510.290365 | 1422.200740 | 1409.541939 | 1378.918704 | 1343.494927 | 1325.675278 | 1270.246497 | 1101.396010 | 1166.734395 |
| Staffordshire Moorlands | 1776.923545 | 1766.082186 | 1742.557987 | 1656.608649 | 1559.741550 | 1655.713572 | 1603.156284 | 1544.428454 | 1564.011891 | 1546.833793 | 1525.699941 | 1497.001650 | 1413.357126 | 1475.730299 | 1386.199936 | 1274.232906 | 1332.991944 |
| Stoke-on-Trent | 2041.728131 | 2025.362057 | 1977.856972 | 1900.940420 | 1777.333094 | 1764.352853 | 1581.980925 | 1584.479240 | 1577.177978 | 1606.524597 | 1550.265296 | 1507.331663 | 1261.650147 | 1223.080833 | 1153.929946 | 1020.164468 | 1127.356084 |
| Stratford-on-Avon | 1501.480094 | 1513.735452 | 1541.393414 | 1488.757386 | 1386.832211 | 1436.903217 | 1352.934775 | 1446.126565 | 1414.636678 | 1431.627278 | 1322.595288 | 1324.473512 | 1317.037715 | 1314.087715 | 1223.348640 | 1042.939059 | 1120.139928 |
| Tamworth | 475.068557 | 474.697517 | 448.760723 | 436.457918 | 421.964679 | 440.200738 | 404.836659 | 414.618735 | 406.701043 | 363.694164 | 347.631884 | 326.917746 | 317.951130 | 318.231567 | 290.820254 | 254.020488 | 268.838288 |
| Telford and Wrekin | 1659.122170 | 1685.979582 | 1682.256128 | 1530.383297 | 1419.460432 | 1502.746044 | 1426.316317 | 1494.792938 | 1385.950669 | 1274.826827 | 1254.843761 | 1160.254419 | 1154.173983 | 1183.265359 | 1119.790339 | 952.853681 | 1002.876199 |
| Walsall | 1841.638129 | 1829.666727 | 1772.033335 | 1673.938533 | 1488.754727 | 1624.627863 | 1525.409270 | 1549.103445 | 1531.029950 | 1380.643572 | 1300.406164 | 1250.226818 | 1242.130066 | 1201.211275 | 1119.598232 | 1009.154672 | 1088.510816 |
| Warwick | 1391.720595 | 1400.843990 | 1409.130871 | 1362.768363 | 1273.867598 | 1277.589915 | 1173.237382 | 1216.060961 | 1177.002231 | 1071.245807 | 1097.625980 | 1065.827135 | 1033.018272 | 1010.087886 | 964.526191 | 797.752307 | 892.125608 |
| Wolverhampton | 1677.179103 | 1629.700831 | 1587.437458 | 1536.582527 | 1398.213887 | 1493.486284 | 1370.306537 | 1402.082164 | 1367.714396 | 1232.190047 | 1196.524784 | 1135.863036 | 1090.199900 | 1060.553002 | 989.441812 | 917.369776 | 985.990590 |
| Worcester | 643.542399 | 616.473972 | 657.726991 | 608.166828 | 572.476104 | 601.253296 | 557.011139 | 575.389895 | 551.444982 | 496.471837 | 475.843472 | 447.628667 | 420.763103 | 406.786384 | 381.230693 | 344.888325 | 375.993686 |
| Wychavon | 1447.518551 | 1504.532587 | 1542.140801 | 1467.778761 | 1367.405328 | 1425.701966 | 1300.782922 | 1309.649052 | 1311.106432 | 1260.514044 | 1248.715685 | 1232.972760 | 1205.932168 | 1196.208234 | 1145.581204 | 1013.927972 | 1148.046126 |
| Wyre Forest | 697.678624 | 699.747403 | 750.464334 | 692.233167 | 628.671292 | 664.291809 | 616.665147 | 630.668567 | 612.868789 | 570.347427 | 543.650712 | 522.982812 | 499.390383 | 489.710354 | 458.909804 | 408.559610 | 436.541231 |
import pandas as pd
# Calculate the yearly total emissions for each year
yearly_totals = west_midlands_local_authority_emissions.sum()
# Append the totals as a new row to the DataFrame
west_midlands_local_authority_emissions.loc['Total'] = yearly_totals
west_midlands_local_authority_emissions
| Calendar Year | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | 2021 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Local Authority | |||||||||||||||||
| Birmingham | 7059.334372 | 6952.112034 | 6800.522322 | 6744.677640 | 6099.737055 | 6412.804284 | 5805.907414 | 6094.847957 | 5928.649539 | 5283.528982 | 5187.812112 | 5018.990855 | 4952.980766 | 4694.297182 | 4500.378294 | 4177.186378 | 4480.654084 |
| Bromsgrove | 956.642291 | 966.027098 | 1014.325992 | 966.508457 | 915.239428 | 943.043254 | 865.875640 | 885.439318 | 875.957613 | 847.917226 | 839.358149 | 828.019311 | 799.333806 | 765.570975 | 756.273104 | 655.139864 | 711.760256 |
| Cannock Chase | 588.979929 | 597.874966 | 584.212829 | 554.965633 | 534.431424 | 565.871617 | 516.643905 | 523.651492 | 521.293200 | 463.806795 | 448.224377 | 432.415392 | 438.056787 | 428.182712 | 400.378294 | 360.878619 | 364.042820 |
| Coventry | 2285.257172 | 2244.459643 | 2109.501041 | 2028.053404 | 1835.162506 | 1962.611578 | 1766.002096 | 1881.254900 | 1815.756821 | 1664.671871 | 1647.571233 | 1574.337656 | 1548.946359 | 1496.667926 | 1428.701726 | 1280.312165 | 1344.000349 |
| Dudley | 1983.590348 | 1972.154700 | 1888.505634 | 1818.345968 | 1657.763334 | 1778.665829 | 1603.028674 | 1645.915361 | 1619.695684 | 1452.515600 | 1412.477913 | 1356.141083 | 1347.120418 | 1295.213550 | 1217.885198 | 1079.756656 | 1163.033149 |
| East Staffordshire | 1287.270750 | 1298.775799 | 1260.546128 | 1245.328814 | 1170.588639 | 1228.732489 | 1160.773439 | 1165.877455 | 1143.246946 | 1050.973296 | 1014.291935 | 988.598271 | 981.240089 | 966.664142 | 903.141998 | 810.544121 | 853.521292 |
| Herefordshire, County of | 2091.684474 | 2090.207781 | 2038.853224 | 1975.986634 | 1872.500149 | 1994.484316 | 1849.438619 | 1892.331768 | 1847.560317 | 1777.587181 | 1696.464807 | 1660.690961 | 1618.084774 | 1598.639996 | 1527.132610 | 1364.626064 | 1472.865799 |
| Lichfield | 1004.513597 | 1022.789515 | 1013.242837 | 993.374843 | 955.231477 | 981.016859 | 931.692962 | 938.137617 | 935.001562 | 875.810423 | 863.532062 | 850.456232 | 851.408592 | 837.842706 | 813.911260 | 691.404937 | 754.890754 |
| Malvern Hills | 772.773159 | 799.621230 | 848.442583 | 797.397569 | 757.938462 | 782.645008 | 723.285139 | 726.094378 | 717.922272 | 695.654404 | 688.580938 | 676.822279 | 655.588099 | 642.098370 | 619.590085 | 538.434311 | 601.444378 |
| Newcastle-under-Lyme | 1121.822896 | 1113.654899 | 1107.267011 | 1087.895669 | 1042.573930 | 1083.119199 | 1024.749999 | 1024.635624 | 1021.924809 | 939.404697 | 969.297348 | 944.169090 | 929.997916 | 911.784056 | 875.110449 | 784.361909 | 844.190037 |
| North Warwickshire | 1147.478903 | 1203.041332 | 1226.193958 | 1194.492043 | 1121.940224 | 1191.290419 | 1085.180037 | 1110.955808 | 1077.930006 | 1065.117153 | 1060.260000 | 1029.729619 | 1025.527236 | 1001.285910 | 961.051901 | 823.867578 | 919.630410 |
| Nuneaton and Bedworth | 813.097075 | 815.719471 | 861.239859 | 828.813779 | 772.443801 | 765.017013 | 677.858160 | 704.027486 | 684.424974 | 638.996823 | 634.593829 | 606.356718 | 597.691045 | 593.546030 | 563.796985 | 505.938444 | 540.854821 |
| Redditch | 585.906245 | 591.548822 | 632.697674 | 593.920918 | 533.689582 | 577.553981 | 522.724477 | 528.282173 | 500.489462 | 468.059313 | 444.165840 | 421.000712 | 406.630410 | 394.836203 | 383.634496 | 332.680616 | 357.787785 |
| Rugby | 2330.213067 | 2373.288078 | 2651.893434 | 2411.372020 | 2278.320929 | 2283.729045 | 2334.076177 | 2124.141199 | 2157.994109 | 2133.777044 | 1993.474564 | 2109.935757 | 2000.143439 | 2041.847091 | 1988.209932 | 1853.518649 | 2026.454350 |
| Sandwell | 2289.664199 | 2311.288395 | 2271.410144 | 2149.609456 | 1915.683984 | 2035.995284 | 1862.670040 | 1942.089900 | 1917.089725 | 1748.667558 | 1785.651958 | 1577.895063 | 1587.577139 | 1534.233142 | 1453.591967 | 1315.105902 | 1401.088684 |
| Shropshire | 3788.039158 | 3772.811560 | 3701.754274 | 3605.369762 | 3330.136985 | 3537.762726 | 3277.426402 | 3379.256702 | 3297.401331 | 3129.205145 | 3092.862487 | 3029.587506 | 2980.901584 | 2986.369707 | 2882.366244 | 2645.075805 | 2773.370585 |
| Solihull | 1735.803608 | 1801.630633 | 1757.701820 | 1667.849693 | 1565.457992 | 1683.199281 | 1588.399097 | 1667.768397 | 1627.491397 | 1492.473058 | 1518.294020 | 1475.326158 | 1427.841084 | 1410.483026 | 1327.805102 | 1145.680549 | 1226.181599 |
| South Staffordshire | 1142.941882 | 1174.670664 | 1167.283617 | 1104.231784 | 1101.010671 | 1130.681369 | 1083.159234 | 1088.067517 | 1082.128759 | 1028.457756 | 1000.357801 | 1002.784932 | 1034.087269 | 1000.812994 | 973.519842 | 870.868301 | 954.939263 |
| Stafford | 1593.609395 | 1616.541260 | 1569.472252 | 1566.564484 | 1533.426323 | 1571.727852 | 1488.504814 | 1541.704997 | 1510.290365 | 1422.200740 | 1409.541939 | 1378.918704 | 1343.494927 | 1325.675278 | 1270.246497 | 1101.396010 | 1166.734395 |
| Staffordshire Moorlands | 1776.923545 | 1766.082186 | 1742.557987 | 1656.608649 | 1559.741550 | 1655.713572 | 1603.156284 | 1544.428454 | 1564.011891 | 1546.833793 | 1525.699941 | 1497.001650 | 1413.357126 | 1475.730299 | 1386.199936 | 1274.232906 | 1332.991944 |
| Stoke-on-Trent | 2041.728131 | 2025.362057 | 1977.856972 | 1900.940420 | 1777.333094 | 1764.352853 | 1581.980925 | 1584.479240 | 1577.177978 | 1606.524597 | 1550.265296 | 1507.331663 | 1261.650147 | 1223.080833 | 1153.929946 | 1020.164468 | 1127.356084 |
| Stratford-on-Avon | 1501.480094 | 1513.735452 | 1541.393414 | 1488.757386 | 1386.832211 | 1436.903217 | 1352.934775 | 1446.126565 | 1414.636678 | 1431.627278 | 1322.595288 | 1324.473512 | 1317.037715 | 1314.087715 | 1223.348640 | 1042.939059 | 1120.139928 |
| Tamworth | 475.068557 | 474.697517 | 448.760723 | 436.457918 | 421.964679 | 440.200738 | 404.836659 | 414.618735 | 406.701043 | 363.694164 | 347.631884 | 326.917746 | 317.951130 | 318.231567 | 290.820254 | 254.020488 | 268.838288 |
| Telford and Wrekin | 1659.122170 | 1685.979582 | 1682.256128 | 1530.383297 | 1419.460432 | 1502.746044 | 1426.316317 | 1494.792938 | 1385.950669 | 1274.826827 | 1254.843761 | 1160.254419 | 1154.173983 | 1183.265359 | 1119.790339 | 952.853681 | 1002.876199 |
| Walsall | 1841.638129 | 1829.666727 | 1772.033335 | 1673.938533 | 1488.754727 | 1624.627863 | 1525.409270 | 1549.103445 | 1531.029950 | 1380.643572 | 1300.406164 | 1250.226818 | 1242.130066 | 1201.211275 | 1119.598232 | 1009.154672 | 1088.510816 |
| Warwick | 1391.720595 | 1400.843990 | 1409.130871 | 1362.768363 | 1273.867598 | 1277.589915 | 1173.237382 | 1216.060961 | 1177.002231 | 1071.245807 | 1097.625980 | 1065.827135 | 1033.018272 | 1010.087886 | 964.526191 | 797.752307 | 892.125608 |
| Wolverhampton | 1677.179103 | 1629.700831 | 1587.437458 | 1536.582527 | 1398.213887 | 1493.486284 | 1370.306537 | 1402.082164 | 1367.714396 | 1232.190047 | 1196.524784 | 1135.863036 | 1090.199900 | 1060.553002 | 989.441812 | 917.369776 | 985.990590 |
| Worcester | 643.542399 | 616.473972 | 657.726991 | 608.166828 | 572.476104 | 601.253296 | 557.011139 | 575.389895 | 551.444982 | 496.471837 | 475.843472 | 447.628667 | 420.763103 | 406.786384 | 381.230693 | 344.888325 | 375.993686 |
| Wychavon | 1447.518551 | 1504.532587 | 1542.140801 | 1467.778761 | 1367.405328 | 1425.701966 | 1300.782922 | 1309.649052 | 1311.106432 | 1260.514044 | 1248.715685 | 1232.972760 | 1205.932168 | 1196.208234 | 1145.581204 | 1013.927972 | 1148.046126 |
| Wyre Forest | 697.678624 | 699.747403 | 750.464334 | 692.233167 | 628.671292 | 664.291809 | 616.665147 | 630.668567 | 612.868789 | 570.347427 | 543.650712 | 522.982812 | 499.390383 | 489.710354 | 458.909804 | 408.559610 | 436.541231 |
| Total | 49732.222417 | 49865.040184 | 49616.825647 | 47689.374420 | 44287.997799 | 46396.818960 | 43080.033683 | 44031.880064 | 43181.893928 | 40413.744458 | 39570.616279 | 38433.656517 | 37482.255734 | 36805.003905 | 35080.103036 | 31372.640144 | 33736.855308 |
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler
data_transposed = west_midlands_local_authority_emissions
data_transposed
| Calendar Year | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | 2021 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Local Authority | |||||||||||||||||
| Birmingham | 7059.334372 | 6952.112034 | 6800.522322 | 6744.677640 | 6099.737055 | 6412.804284 | 5805.907414 | 6094.847957 | 5928.649539 | 5283.528982 | 5187.812112 | 5018.990855 | 4952.980766 | 4694.297182 | 4500.378294 | 4177.186378 | 4480.654084 |
| Bromsgrove | 956.642291 | 966.027098 | 1014.325992 | 966.508457 | 915.239428 | 943.043254 | 865.875640 | 885.439318 | 875.957613 | 847.917226 | 839.358149 | 828.019311 | 799.333806 | 765.570975 | 756.273104 | 655.139864 | 711.760256 |
| Cannock Chase | 588.979929 | 597.874966 | 584.212829 | 554.965633 | 534.431424 | 565.871617 | 516.643905 | 523.651492 | 521.293200 | 463.806795 | 448.224377 | 432.415392 | 438.056787 | 428.182712 | 400.378294 | 360.878619 | 364.042820 |
| Coventry | 2285.257172 | 2244.459643 | 2109.501041 | 2028.053404 | 1835.162506 | 1962.611578 | 1766.002096 | 1881.254900 | 1815.756821 | 1664.671871 | 1647.571233 | 1574.337656 | 1548.946359 | 1496.667926 | 1428.701726 | 1280.312165 | 1344.000349 |
| Dudley | 1983.590348 | 1972.154700 | 1888.505634 | 1818.345968 | 1657.763334 | 1778.665829 | 1603.028674 | 1645.915361 | 1619.695684 | 1452.515600 | 1412.477913 | 1356.141083 | 1347.120418 | 1295.213550 | 1217.885198 | 1079.756656 | 1163.033149 |
| East Staffordshire | 1287.270750 | 1298.775799 | 1260.546128 | 1245.328814 | 1170.588639 | 1228.732489 | 1160.773439 | 1165.877455 | 1143.246946 | 1050.973296 | 1014.291935 | 988.598271 | 981.240089 | 966.664142 | 903.141998 | 810.544121 | 853.521292 |
| Herefordshire, County of | 2091.684474 | 2090.207781 | 2038.853224 | 1975.986634 | 1872.500149 | 1994.484316 | 1849.438619 | 1892.331768 | 1847.560317 | 1777.587181 | 1696.464807 | 1660.690961 | 1618.084774 | 1598.639996 | 1527.132610 | 1364.626064 | 1472.865799 |
| Lichfield | 1004.513597 | 1022.789515 | 1013.242837 | 993.374843 | 955.231477 | 981.016859 | 931.692962 | 938.137617 | 935.001562 | 875.810423 | 863.532062 | 850.456232 | 851.408592 | 837.842706 | 813.911260 | 691.404937 | 754.890754 |
| Malvern Hills | 772.773159 | 799.621230 | 848.442583 | 797.397569 | 757.938462 | 782.645008 | 723.285139 | 726.094378 | 717.922272 | 695.654404 | 688.580938 | 676.822279 | 655.588099 | 642.098370 | 619.590085 | 538.434311 | 601.444378 |
| Newcastle-under-Lyme | 1121.822896 | 1113.654899 | 1107.267011 | 1087.895669 | 1042.573930 | 1083.119199 | 1024.749999 | 1024.635624 | 1021.924809 | 939.404697 | 969.297348 | 944.169090 | 929.997916 | 911.784056 | 875.110449 | 784.361909 | 844.190037 |
| North Warwickshire | 1147.478903 | 1203.041332 | 1226.193958 | 1194.492043 | 1121.940224 | 1191.290419 | 1085.180037 | 1110.955808 | 1077.930006 | 1065.117153 | 1060.260000 | 1029.729619 | 1025.527236 | 1001.285910 | 961.051901 | 823.867578 | 919.630410 |
| Nuneaton and Bedworth | 813.097075 | 815.719471 | 861.239859 | 828.813779 | 772.443801 | 765.017013 | 677.858160 | 704.027486 | 684.424974 | 638.996823 | 634.593829 | 606.356718 | 597.691045 | 593.546030 | 563.796985 | 505.938444 | 540.854821 |
| Redditch | 585.906245 | 591.548822 | 632.697674 | 593.920918 | 533.689582 | 577.553981 | 522.724477 | 528.282173 | 500.489462 | 468.059313 | 444.165840 | 421.000712 | 406.630410 | 394.836203 | 383.634496 | 332.680616 | 357.787785 |
| Rugby | 2330.213067 | 2373.288078 | 2651.893434 | 2411.372020 | 2278.320929 | 2283.729045 | 2334.076177 | 2124.141199 | 2157.994109 | 2133.777044 | 1993.474564 | 2109.935757 | 2000.143439 | 2041.847091 | 1988.209932 | 1853.518649 | 2026.454350 |
| Sandwell | 2289.664199 | 2311.288395 | 2271.410144 | 2149.609456 | 1915.683984 | 2035.995284 | 1862.670040 | 1942.089900 | 1917.089725 | 1748.667558 | 1785.651958 | 1577.895063 | 1587.577139 | 1534.233142 | 1453.591967 | 1315.105902 | 1401.088684 |
| Shropshire | 3788.039158 | 3772.811560 | 3701.754274 | 3605.369762 | 3330.136985 | 3537.762726 | 3277.426402 | 3379.256702 | 3297.401331 | 3129.205145 | 3092.862487 | 3029.587506 | 2980.901584 | 2986.369707 | 2882.366244 | 2645.075805 | 2773.370585 |
| Solihull | 1735.803608 | 1801.630633 | 1757.701820 | 1667.849693 | 1565.457992 | 1683.199281 | 1588.399097 | 1667.768397 | 1627.491397 | 1492.473058 | 1518.294020 | 1475.326158 | 1427.841084 | 1410.483026 | 1327.805102 | 1145.680549 | 1226.181599 |
| South Staffordshire | 1142.941882 | 1174.670664 | 1167.283617 | 1104.231784 | 1101.010671 | 1130.681369 | 1083.159234 | 1088.067517 | 1082.128759 | 1028.457756 | 1000.357801 | 1002.784932 | 1034.087269 | 1000.812994 | 973.519842 | 870.868301 | 954.939263 |
| Stafford | 1593.609395 | 1616.541260 | 1569.472252 | 1566.564484 | 1533.426323 | 1571.727852 | 1488.504814 | 1541.704997 | 1510.290365 | 1422.200740 | 1409.541939 | 1378.918704 | 1343.494927 | 1325.675278 | 1270.246497 | 1101.396010 | 1166.734395 |
| Staffordshire Moorlands | 1776.923545 | 1766.082186 | 1742.557987 | 1656.608649 | 1559.741550 | 1655.713572 | 1603.156284 | 1544.428454 | 1564.011891 | 1546.833793 | 1525.699941 | 1497.001650 | 1413.357126 | 1475.730299 | 1386.199936 | 1274.232906 | 1332.991944 |
| Stoke-on-Trent | 2041.728131 | 2025.362057 | 1977.856972 | 1900.940420 | 1777.333094 | 1764.352853 | 1581.980925 | 1584.479240 | 1577.177978 | 1606.524597 | 1550.265296 | 1507.331663 | 1261.650147 | 1223.080833 | 1153.929946 | 1020.164468 | 1127.356084 |
| Stratford-on-Avon | 1501.480094 | 1513.735452 | 1541.393414 | 1488.757386 | 1386.832211 | 1436.903217 | 1352.934775 | 1446.126565 | 1414.636678 | 1431.627278 | 1322.595288 | 1324.473512 | 1317.037715 | 1314.087715 | 1223.348640 | 1042.939059 | 1120.139928 |
| Tamworth | 475.068557 | 474.697517 | 448.760723 | 436.457918 | 421.964679 | 440.200738 | 404.836659 | 414.618735 | 406.701043 | 363.694164 | 347.631884 | 326.917746 | 317.951130 | 318.231567 | 290.820254 | 254.020488 | 268.838288 |
| Telford and Wrekin | 1659.122170 | 1685.979582 | 1682.256128 | 1530.383297 | 1419.460432 | 1502.746044 | 1426.316317 | 1494.792938 | 1385.950669 | 1274.826827 | 1254.843761 | 1160.254419 | 1154.173983 | 1183.265359 | 1119.790339 | 952.853681 | 1002.876199 |
| Walsall | 1841.638129 | 1829.666727 | 1772.033335 | 1673.938533 | 1488.754727 | 1624.627863 | 1525.409270 | 1549.103445 | 1531.029950 | 1380.643572 | 1300.406164 | 1250.226818 | 1242.130066 | 1201.211275 | 1119.598232 | 1009.154672 | 1088.510816 |
| Warwick | 1391.720595 | 1400.843990 | 1409.130871 | 1362.768363 | 1273.867598 | 1277.589915 | 1173.237382 | 1216.060961 | 1177.002231 | 1071.245807 | 1097.625980 | 1065.827135 | 1033.018272 | 1010.087886 | 964.526191 | 797.752307 | 892.125608 |
| Wolverhampton | 1677.179103 | 1629.700831 | 1587.437458 | 1536.582527 | 1398.213887 | 1493.486284 | 1370.306537 | 1402.082164 | 1367.714396 | 1232.190047 | 1196.524784 | 1135.863036 | 1090.199900 | 1060.553002 | 989.441812 | 917.369776 | 985.990590 |
| Worcester | 643.542399 | 616.473972 | 657.726991 | 608.166828 | 572.476104 | 601.253296 | 557.011139 | 575.389895 | 551.444982 | 496.471837 | 475.843472 | 447.628667 | 420.763103 | 406.786384 | 381.230693 | 344.888325 | 375.993686 |
| Wychavon | 1447.518551 | 1504.532587 | 1542.140801 | 1467.778761 | 1367.405328 | 1425.701966 | 1300.782922 | 1309.649052 | 1311.106432 | 1260.514044 | 1248.715685 | 1232.972760 | 1205.932168 | 1196.208234 | 1145.581204 | 1013.927972 | 1148.046126 |
| Wyre Forest | 697.678624 | 699.747403 | 750.464334 | 692.233167 | 628.671292 | 664.291809 | 616.665147 | 630.668567 | 612.868789 | 570.347427 | 543.650712 | 522.982812 | 499.390383 | 489.710354 | 458.909804 | 408.559610 | 436.541231 |
X = data_transposed.loc[:, 2005:2020]
y = data_transposed[2021]
test_size = 0.2
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size, random_state=10)
# Train the model
regressor = LinearRegression()
regressor.fit(X_train, y_train)
# Evaluate the model
print("Intercept:", regressor.intercept_)
print("Coefficients:", regressor.coef_)
# Predicting
y_pred = regressor.predict(X_test)
# Printing shapes of predictions and test labels
print(y_pred.shape, y_test.shape)
Intercept: -16.304564086553683 Coefficients: [-0.164813 0.01567557 0.30325461 -0.30009949 0.24781068 0.32482985 -0.01417399 -0.26335244 -0.28176354 0.05105317 -0.09310551 0.08112285 0.73103766 0.12332474 -0.58931361 0.84632316] (6,) (6,)
X
| Calendar Year | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Local Authority | ||||||||||||||||
| Birmingham | 7059.334372 | 6952.112034 | 6800.522322 | 6744.677640 | 6099.737055 | 6412.804284 | 5805.907414 | 6094.847957 | 5928.649539 | 5283.528982 | 5187.812112 | 5018.990855 | 4952.980766 | 4694.297182 | 4500.378294 | 4177.186378 |
| Bromsgrove | 956.642291 | 966.027098 | 1014.325992 | 966.508457 | 915.239428 | 943.043254 | 865.875640 | 885.439318 | 875.957613 | 847.917226 | 839.358149 | 828.019311 | 799.333806 | 765.570975 | 756.273104 | 655.139864 |
| Cannock Chase | 588.979929 | 597.874966 | 584.212829 | 554.965633 | 534.431424 | 565.871617 | 516.643905 | 523.651492 | 521.293200 | 463.806795 | 448.224377 | 432.415392 | 438.056787 | 428.182712 | 400.378294 | 360.878619 |
| Coventry | 2285.257172 | 2244.459643 | 2109.501041 | 2028.053404 | 1835.162506 | 1962.611578 | 1766.002096 | 1881.254900 | 1815.756821 | 1664.671871 | 1647.571233 | 1574.337656 | 1548.946359 | 1496.667926 | 1428.701726 | 1280.312165 |
| Dudley | 1983.590348 | 1972.154700 | 1888.505634 | 1818.345968 | 1657.763334 | 1778.665829 | 1603.028674 | 1645.915361 | 1619.695684 | 1452.515600 | 1412.477913 | 1356.141083 | 1347.120418 | 1295.213550 | 1217.885198 | 1079.756656 |
| East Staffordshire | 1287.270750 | 1298.775799 | 1260.546128 | 1245.328814 | 1170.588639 | 1228.732489 | 1160.773439 | 1165.877455 | 1143.246946 | 1050.973296 | 1014.291935 | 988.598271 | 981.240089 | 966.664142 | 903.141998 | 810.544121 |
| Herefordshire, County of | 2091.684474 | 2090.207781 | 2038.853224 | 1975.986634 | 1872.500149 | 1994.484316 | 1849.438619 | 1892.331768 | 1847.560317 | 1777.587181 | 1696.464807 | 1660.690961 | 1618.084774 | 1598.639996 | 1527.132610 | 1364.626064 |
| Lichfield | 1004.513597 | 1022.789515 | 1013.242837 | 993.374843 | 955.231477 | 981.016859 | 931.692962 | 938.137617 | 935.001562 | 875.810423 | 863.532062 | 850.456232 | 851.408592 | 837.842706 | 813.911260 | 691.404937 |
| Malvern Hills | 772.773159 | 799.621230 | 848.442583 | 797.397569 | 757.938462 | 782.645008 | 723.285139 | 726.094378 | 717.922272 | 695.654404 | 688.580938 | 676.822279 | 655.588099 | 642.098370 | 619.590085 | 538.434311 |
| Newcastle-under-Lyme | 1121.822896 | 1113.654899 | 1107.267011 | 1087.895669 | 1042.573930 | 1083.119199 | 1024.749999 | 1024.635624 | 1021.924809 | 939.404697 | 969.297348 | 944.169090 | 929.997916 | 911.784056 | 875.110449 | 784.361909 |
| North Warwickshire | 1147.478903 | 1203.041332 | 1226.193958 | 1194.492043 | 1121.940224 | 1191.290419 | 1085.180037 | 1110.955808 | 1077.930006 | 1065.117153 | 1060.260000 | 1029.729619 | 1025.527236 | 1001.285910 | 961.051901 | 823.867578 |
| Nuneaton and Bedworth | 813.097075 | 815.719471 | 861.239859 | 828.813779 | 772.443801 | 765.017013 | 677.858160 | 704.027486 | 684.424974 | 638.996823 | 634.593829 | 606.356718 | 597.691045 | 593.546030 | 563.796985 | 505.938444 |
| Redditch | 585.906245 | 591.548822 | 632.697674 | 593.920918 | 533.689582 | 577.553981 | 522.724477 | 528.282173 | 500.489462 | 468.059313 | 444.165840 | 421.000712 | 406.630410 | 394.836203 | 383.634496 | 332.680616 |
| Rugby | 2330.213067 | 2373.288078 | 2651.893434 | 2411.372020 | 2278.320929 | 2283.729045 | 2334.076177 | 2124.141199 | 2157.994109 | 2133.777044 | 1993.474564 | 2109.935757 | 2000.143439 | 2041.847091 | 1988.209932 | 1853.518649 |
| Sandwell | 2289.664199 | 2311.288395 | 2271.410144 | 2149.609456 | 1915.683984 | 2035.995284 | 1862.670040 | 1942.089900 | 1917.089725 | 1748.667558 | 1785.651958 | 1577.895063 | 1587.577139 | 1534.233142 | 1453.591967 | 1315.105902 |
| Shropshire | 3788.039158 | 3772.811560 | 3701.754274 | 3605.369762 | 3330.136985 | 3537.762726 | 3277.426402 | 3379.256702 | 3297.401331 | 3129.205145 | 3092.862487 | 3029.587506 | 2980.901584 | 2986.369707 | 2882.366244 | 2645.075805 |
| Solihull | 1735.803608 | 1801.630633 | 1757.701820 | 1667.849693 | 1565.457992 | 1683.199281 | 1588.399097 | 1667.768397 | 1627.491397 | 1492.473058 | 1518.294020 | 1475.326158 | 1427.841084 | 1410.483026 | 1327.805102 | 1145.680549 |
| South Staffordshire | 1142.941882 | 1174.670664 | 1167.283617 | 1104.231784 | 1101.010671 | 1130.681369 | 1083.159234 | 1088.067517 | 1082.128759 | 1028.457756 | 1000.357801 | 1002.784932 | 1034.087269 | 1000.812994 | 973.519842 | 870.868301 |
| Stafford | 1593.609395 | 1616.541260 | 1569.472252 | 1566.564484 | 1533.426323 | 1571.727852 | 1488.504814 | 1541.704997 | 1510.290365 | 1422.200740 | 1409.541939 | 1378.918704 | 1343.494927 | 1325.675278 | 1270.246497 | 1101.396010 |
| Staffordshire Moorlands | 1776.923545 | 1766.082186 | 1742.557987 | 1656.608649 | 1559.741550 | 1655.713572 | 1603.156284 | 1544.428454 | 1564.011891 | 1546.833793 | 1525.699941 | 1497.001650 | 1413.357126 | 1475.730299 | 1386.199936 | 1274.232906 |
| Stoke-on-Trent | 2041.728131 | 2025.362057 | 1977.856972 | 1900.940420 | 1777.333094 | 1764.352853 | 1581.980925 | 1584.479240 | 1577.177978 | 1606.524597 | 1550.265296 | 1507.331663 | 1261.650147 | 1223.080833 | 1153.929946 | 1020.164468 |
| Stratford-on-Avon | 1501.480094 | 1513.735452 | 1541.393414 | 1488.757386 | 1386.832211 | 1436.903217 | 1352.934775 | 1446.126565 | 1414.636678 | 1431.627278 | 1322.595288 | 1324.473512 | 1317.037715 | 1314.087715 | 1223.348640 | 1042.939059 |
| Tamworth | 475.068557 | 474.697517 | 448.760723 | 436.457918 | 421.964679 | 440.200738 | 404.836659 | 414.618735 | 406.701043 | 363.694164 | 347.631884 | 326.917746 | 317.951130 | 318.231567 | 290.820254 | 254.020488 |
| Telford and Wrekin | 1659.122170 | 1685.979582 | 1682.256128 | 1530.383297 | 1419.460432 | 1502.746044 | 1426.316317 | 1494.792938 | 1385.950669 | 1274.826827 | 1254.843761 | 1160.254419 | 1154.173983 | 1183.265359 | 1119.790339 | 952.853681 |
| Walsall | 1841.638129 | 1829.666727 | 1772.033335 | 1673.938533 | 1488.754727 | 1624.627863 | 1525.409270 | 1549.103445 | 1531.029950 | 1380.643572 | 1300.406164 | 1250.226818 | 1242.130066 | 1201.211275 | 1119.598232 | 1009.154672 |
| Warwick | 1391.720595 | 1400.843990 | 1409.130871 | 1362.768363 | 1273.867598 | 1277.589915 | 1173.237382 | 1216.060961 | 1177.002231 | 1071.245807 | 1097.625980 | 1065.827135 | 1033.018272 | 1010.087886 | 964.526191 | 797.752307 |
| Wolverhampton | 1677.179103 | 1629.700831 | 1587.437458 | 1536.582527 | 1398.213887 | 1493.486284 | 1370.306537 | 1402.082164 | 1367.714396 | 1232.190047 | 1196.524784 | 1135.863036 | 1090.199900 | 1060.553002 | 989.441812 | 917.369776 |
| Worcester | 643.542399 | 616.473972 | 657.726991 | 608.166828 | 572.476104 | 601.253296 | 557.011139 | 575.389895 | 551.444982 | 496.471837 | 475.843472 | 447.628667 | 420.763103 | 406.786384 | 381.230693 | 344.888325 |
| Wychavon | 1447.518551 | 1504.532587 | 1542.140801 | 1467.778761 | 1367.405328 | 1425.701966 | 1300.782922 | 1309.649052 | 1311.106432 | 1260.514044 | 1248.715685 | 1232.972760 | 1205.932168 | 1196.208234 | 1145.581204 | 1013.927972 |
| Wyre Forest | 697.678624 | 699.747403 | 750.464334 | 692.233167 | 628.671292 | 664.291809 | 616.665147 | 630.668567 | 612.868789 | 570.347427 | 543.650712 | 522.982812 | 499.390383 | 489.710354 | 458.909804 | 408.559610 |
| Total | 49732.222417 | 49865.040184 | 49616.825647 | 47689.374420 | 44287.997799 | 46396.818960 | 43080.033683 | 44031.880064 | 43181.893928 | 40413.744458 | 39570.616279 | 38433.656517 | 37482.255734 | 36805.003905 | 35080.103036 | 31372.640144 |
y
Local Authority Birmingham 4480.654084 Bromsgrove 711.760256 Cannock Chase 364.042820 Coventry 1344.000349 Dudley 1163.033149 East Staffordshire 853.521292 Herefordshire, County of 1472.865799 Lichfield 754.890754 Malvern Hills 601.444378 Newcastle-under-Lyme 844.190037 North Warwickshire 919.630410 Nuneaton and Bedworth 540.854821 Redditch 357.787785 Rugby 2026.454350 Sandwell 1401.088684 Shropshire 2773.370585 Solihull 1226.181599 South Staffordshire 954.939263 Stafford 1166.734395 Staffordshire Moorlands 1332.991944 Stoke-on-Trent 1127.356084 Stratford-on-Avon 1120.139928 Tamworth 268.838288 Telford and Wrekin 1002.876199 Walsall 1088.510816 Warwick 892.125608 Wolverhampton 985.990590 Worcester 375.993686 Wychavon 1148.046126 Wyre Forest 436.541231 Name: 2021, dtype: float64
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler
west_midlands_local_authority_emissions
X = west_midlands_local_authority_emissions.loc[:, 2005:2020]
y = west_midlands_local_authority_emissions[2021]
test_size = 0.2
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size, random_state=10)
# Train the model
regressor = LinearRegression()
regressor.fit(X_train, y_train)
# Evaluate the model
print("Intercept:", regressor.intercept_)
print("Coefficients:", regressor.coef_)
#predicting the test set result
y_pred = regressor.predict(X_test)
print(y_pred.shape,y_test.shape)
y_pred,y_test
Intercept: -16.304564086553683 Coefficients: [-0.164813 0.01567557 0.30325461 -0.30009949 0.24781068 0.32482985 -0.01417399 -0.26335244 -0.28176354 0.05105317 -0.09310551 0.08112285 0.73103766 0.12332474 -0.58931361 0.84632316] (6,) (6,)
(array([1154.20076885, 739.54101022, 873.7407262 , 390.29534116,
1351.3682755 , 1136.4148487 ]),
Local Authority
Stoke-on-Trent 1127.356084
Lichfield 754.890754
East Staffordshire 853.521292
Cannock Chase 364.042820
Coventry 1344.000349
Stratford-on-Avon 1120.139928
Name: 2021, dtype: float64)
Based on the output from the linear regression model, an approximate prediction equation for the emissions in 2021 is formed. The equation is structured as follows, where \( X_i \) represents the emissions in year \( 2005 + i \) (for \( i = 0, 1, 2, \ldots, 15 \)):
\[\text{Emissions}_{2021} = \text{Intercept} + c_1 \cdot X_0 + c_2 \cdot X_1 + \cdots + c_{16} \cdot X_{15} \]
Plugging in the values from the model:
\[\text{Emissions}_{2021} = -16.30 - 0.16 \cdot X_{2005} + 0.02 \cdot X_{2006} + 0.30 \cdot X_{2007} - 0.30 \cdot X_{2008} + 0.25 \cdot X_{2009} + 0.32 \cdot X_{2010} - 0.01 \cdot X_{2011} - 0.26 \cdot X_{2012} - 0.28 \cdot X_{2013} + 0.05 \cdot X_{2014} - 0.09 \cdot X_{2015} + 0.08 \cdot X_{2016} + 0.73 \cdot X_{2017} + 0.12 \cdot X_{2018} - 0.59 \cdot X_{2019} + 0.85 \cdot X_{2020} \]
This equation permit estimates of emissions for the year 2021 based on the emissions from 2005 to 2020. Each coefficient indicates how a change in emissions in a given year is associated with a change in emissions in 2021. The intercept represents the baseline emissions level when all other variables are zero.
| Year | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Coefficient | -0.164813 | 0.01567557 | 0.30325461 | -0.30009949 | 0.24781068 | 0.32482985 | -0.01417399 | -0.26335244 | -0.28176354 | 0.05105317 | -0.09310551 | 0.08112285 | 0.73103766 | 0.12332474 | -0.58931361 | 0.84632316 |
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
# Calculate Mean Absolute Error
mae = mean_absolute_error(y_test, y_pred)
# Calculate Mean Squared Error
mse = mean_squared_error(y_test, y_pred)
# Calculate R^2 Score
r2 = r2_score(y_test, y_pred)
# Calculating the Root Mean Squared Error (RMSE)
rmse = np.sqrt(mse)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)
print("R-squared Score:", r2)
print("Root Mean Squared Error:", rmse)
Mean Absolute Error: 18.71820519177972 Mean Squared Error: 395.57191898109664 R-squared Score: 0.996065760110844 Root Mean Squared Error: 19.88898989343342
import numpy as np
from sklearn.metrics import mean_squared_error
n = len(y_train) # number of observations
k = regressor.coef_.shape[0] + 1 # number of parameters (including the intercept)
mse = mean_squared_error(y_test, y_pred)
residuals = y_test - y_pred
sse = np.sum(residuals**2)
# Log-likelihood
sigma_squared = mse
log_likelihood = -0.5 * n * np.log(2 * np.pi) - 0.5 * n * np.log(sigma_squared) - (1 / (2 * sigma_squared)) * sse
# AIC and BIC
aic = 2 * k - 2 * log_likelihood
bic = np.log(n) * k - 2 * log_likelihood
print(f"Log-likelihood: {log_likelihood}")
print(f"AIC: {aic}")
print(f"BIC: {bic}")
Log-likelihood: -96.8185161632904 AIC: 227.6370323265808 BIC: 247.66394744249587
from joblib import dump, load
# Save the model to a file
dump(regressor, 'regression_model.joblib')
['regression_model.joblib']
# Load the model from the file
model = load('regression_model.joblib')
#using model to make predictions
predictions = model.predict(X_test)
predictions
array([1154.20076885, 739.54101022, 873.7407262 , 390.29534116,
1351.3682755 , 1136.4148487 ])
Regularization
from sklearn.linear_model import Ridge
# alpha is the regularization strength
ridge_regressor = Ridge(alpha=1.0)
ridge_regressor.fit(X_train, y_train)
# Evaluate the model
print("Ridge Intercept:", ridge_regressor.intercept_)
print("Ridge Coefficients:", ridge_regressor.coef_)
Ridge Intercept: -16.31947799472755 Ridge Coefficients: [-0.16475797 0.01509699 0.30314981 -0.29964514 0.24736677 0.32550167 -0.01433364 -0.26439064 -0.27966232 0.05076292 -0.09322371 0.08056498 0.72973804 0.12303375 -0.58689111 0.84544592]
from sklearn.linear_model import Ridge
# alpha is the regularization strength
ridge_regressor = Ridge(alpha=100.0)
ridge_regressor.fit(X_train, y_train)
# Evaluate the model
print("Ridge Intercept:", ridge_regressor.intercept_)
print("Ridge Coefficients:", ridge_regressor.coef_)
Ridge Intercept: -16.6619146624912 Ridge Coefficients: [-0.16608111 -0.00764947 0.28522153 -0.25679853 0.21426021 0.35359872 -0.01749576 -0.32572873 -0.15329251 0.03704983 -0.10297709 0.0527268 0.64133955 0.08796357 -0.41705216 0.78943365]
from sklearn.linear_model import Lasso
# alpha is the regularization strength
# LASSO - least Absolute shrinkage and selection operator
lasso_regressor = Lasso(alpha=0.1)
lasso_regressor.fit(X_train, y_train)
# Evaluate the model
print("Lasso Intercept:", lasso_regressor.intercept_)
print("Lasso Coefficients:", lasso_regressor.coef_)
Lasso Intercept: -20.365287808106586 Lasso Coefficients: [-0.12890393 -0.04281725 0.17837908 0.07241432 0.05918155 -0.09031056 -0.05182621 -0.20454203 0.13725205 0.18819251 -0.13577117 0.3374395 0.11786835 -0.0764537 0.19038713 0.45120508]
C:\ProgramData\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:631: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 2.890e+03, tolerance: 1.837e+03 model = cd_fast.enet_coordinate_descent(
from sklearn.linear_model import ElasticNet
# alpha is the regularization strength and l1_ratio controls the ratio of L1 to L2 penalty
elasticnet_regressor = ElasticNet(alpha=0.1, l1_ratio=0.5)
elasticnet_regressor.fit(X_train, y_train)
# Evaluate the model
print("ElasticNet Intercept:", elasticnet_regressor.intercept_)
print("ElasticNet Coefficients:", elasticnet_regressor.coef_)
ElasticNet Intercept: -20.36437515103262 ElasticNet Coefficients: [-0.12891403 -0.04279652 0.17837821 0.07241618 0.05919524 -0.09035966 -0.05182048 -0.20454239 0.13730612 0.18817715 -0.13582386 0.3374735 0.11786115 -0.07647005 0.19042191 0.45119084]
C:\ProgramData\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:631: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 2.888e+03, tolerance: 1.837e+03 model = cd_fast.enet_coordinate_descent(
from sklearn.linear_model import RidgeCV
# Define a set of alpha values to test
alphas = [0.1, 1.0, 10.0, 100.0]
# Create the RidgeCV regressor
ridge_cv = RidgeCV(alphas=alphas, store_cv_values=True)
# Fit the model to the training data
ridge_cv.fit(X_train, y_train)
# Retrieve the best alpha and coefficients
best_alpha = ridge_cv.alpha_
print("Best Alpha:", best_alpha)
print("RidgeCV Intercept:", ridge_cv.intercept_)
print("RidgeCV Coefficients:", ridge_cv.coef_)
#Inspecting the mean squared errors for each alpha
cv_mse = np.mean(ridge_cv.cv_values_, axis=0)
print("Mean Squared Errors for each alpha:", cv_mse)
Best Alpha: 100.0 RidgeCV Intercept: -16.66191276546465 RidgeCV Coefficients: [-0.16608111 -0.00764947 0.28522153 -0.25679853 0.21426021 0.35359872 -0.01749576 -0.32572873 -0.15329251 0.03704983 -0.10297709 0.0527268 0.64133955 0.08796357 -0.41705216 0.78943365] Mean Squared Errors for each alpha: [1251.71611458 1244.52879052 1180.01026315 864.35242285]
from sklearn.metrics import mean_squared_error, r2_score
# Predict on the test set
y_test_pred = ridge_cv.predict(X_test)
# Calculate metrics
r2 = r2_score(y_test, y_test_pred)
mse = mean_squared_error(y_test, y_test_pred)
rmse = np.sqrt(mse)
print(f"R-squared Score: {r2}")
print(f"Mean Squared Error: {mse}")
print(f"Root Mean Squared Error: {rmse}")
R-squared Score: 0.9971848346207 Mean Squared Error: 283.0535002220636 Root Mean Squared Error: 16.824193895163702
# Define a set of alpha values to test
alphas = [0.1, 1.0, 10.0, 100.0]
# Create the RidgeCV regressor
ridge_cv = RidgeCV(alphas=alphas, scoring='neg_mean_absolute_error', store_cv_values=True)
# Fit the model to the training data
ridge_cv.fit(X_train, y_train)
# Retrieve the best alpha and coefficients
best_alpha = ridge_cv.alpha_
print("Best Alpha:", best_alpha)
Best Alpha: 100.0
import matplotlib.pyplot as plt
import pandas as pd
yearly_totals = west_midlands_local_authority_emissions.sum()
plt.figure(figsize=(10, 6))
yearly_totals.plot(kind='line', marker='o')
plt.title('Yearly Total Emissions in West Midlands')
plt.xlabel('Year')
plt.ylabel('Total Emissions')
plt.grid(True)
plt.show()
London
# Filter the data for the "london" Region/Country
London_data = data_1_1_actual[data_1_1_actual['Region/Country'] == 'London']
London_data
| 3 | Region/Country | Second Tier Authority | Local Authority | Local Authority Code | Calendar Year | Industry Electricity | Industry Gas | Large Industrial Installations | Industry 'Other' | Industry Total | ... | Agriculture Soils | Agriculture Total | Landfill | Waste Management 'Other' | Waste Management Total | Grand Total | Population ('000s, mid-year estimate) | Per Capita Emissions (tCO2e) | Area (km2) | Emissions per km2 (kt CO2e) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3502 | London | Barking and Dagenham | Barking and Dagenham | E09000002 | 2005 | 131.684063 | 17.459426 | 66.870493 | 28.426324 | 244.440306 | ... | 0.211348 | 1.239115 | 161.428095 | 7.486672 | 168.914767 | 1143.769451 | 166.275 | 6.878782 | 37.7992 | 30.259091 |
| 3503 | London | Barking and Dagenham | Barking and Dagenham | E09000002 | 2006 | 139.895257 | 13.63688 | 66.154453 | 28.690501 | 248.377092 | ... | 0.396191 | 1.344134 | 149.864148 | 8.481513 | 158.345662 | 1137.669474 | 167.157 | 6.805994 | 37.7992 | 30.097713 |
| 3504 | London | Barking and Dagenham | Barking and Dagenham | E09000002 | 2007 | 140.487938 | 12.727459 | 69.810178 | 30.324736 | 253.350311 | ... | 0.25175 | 1.199139 | 155.877849 | 8.75395 | 164.631799 | 1129.493177 | 169.031 | 6.682166 | 37.7992 | 29.881404 |
| 3505 | London | Barking and Dagenham | Barking and Dagenham | E09000002 | 2008 | 135.419965 | 12.319912 | 68.498117 | 25.60148 | 241.839473 | ... | 0.151626 | 0.907199 | 128.605453 | 8.869047 | 137.4745 | 1059.170470 | 172.452 | 6.141828 | 37.7992 | 28.020976 |
| 3506 | London | Barking and Dagenham | Barking and Dagenham | E09000002 | 2009 | 115.972501 | 12.15477 | 59.188809 | 21.28696 | 208.603041 | ... | 0.129006 | 0.841917 | 178.517672 | 8.801381 | 187.319052 | 1027.842511 | 177.58 | 5.788053 | 37.7992 | 27.192176 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 4058 | London | Westminster | Westminster | E09000033 | 2017 | 47.698762 | 23.283152 | 0.085511 | 95.767013 | 166.834438 | ... | 0.017115 | 4.485572 | 9.338904 | 7.83116 | 17.170064 | 2062.990819 | 202.606251 | 10.182266 | 22.0301 | 93.644188 |
| 4059 | London | Westminster | Westminster | E09000033 | 2018 | 131.422791 | 59.29191 | 1.014299 | 94.752952 | 286.481952 | ... | 0.019147 | 15.027827 | 11.016105 | 7.773276 | 18.789381 | 1985.655839 | 201.909585 | 9.834381 | 22.0301 | 90.133764 |
| 4060 | London | Westminster | Westminster | E09000033 | 2019 | 106.191379 | 57.354249 | 0.268476 | 84.414842 | 248.228947 | ... | 0.021237 | 14.091068 | 9.521645 | 7.656918 | 17.178562 | 1768.356274 | 200.526526 | 8.818565 | 22.0301 | 80.270007 |
| 4061 | London | Westminster | Westminster | E09000033 | 2020 | 69.206872 | 52.611429 | 0.188609 | 87.61706 | 209.62397 | ... | 0.009882 | 14.423154 | 14.093663 | 7.487961 | 21.581624 | 1444.948575 | 200.752862 | 7.197649 | 22.0301 | 65.589742 |
| 4062 | London | Westminster | Westminster | E09000033 | 2021 | 94.854934 | 69.663544 | 0.186295 | 107.401328 | 272.106101 | ... | 0.040033 | 21.398514 | 4.464876 | 7.905342 | 12.370218 | 1671.932170 | 205.087 | 8.152307 | 22.0301 | 75.893081 |
561 rows × 50 columns
# Filter out rows where "Second Tier Authority" ends with "Total"
London_data_filtered = London_data[~London_data['Second Tier Authority'].str.endswith("Total")]
# List of columns to keep
columns_to_keep = ['Region/Country', 'Second Tier Authority', 'Local Authority', 'Local Authority Code', 'Calendar Year', 'LULUCF Net Emissions']
# Add columns that end with "Total"
columns_to_keep.extend([col for col in London_data_filtered.columns if col.endswith("Total")])
London_filtered_columns = London_data_filtered[columns_to_keep]
London_filtered_columns.to_csv('London_filtered_columns.csv')
London_filtered_columns
| 3 | Region/Country | Second Tier Authority | Local Authority | Local Authority Code | Calendar Year | LULUCF Net Emissions | Industry Total | Commercial Total | Public Sector Total | Domestic Total | Transport Total | Agriculture Total | Waste Management Total | Grand Total |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3502 | London | Barking and Dagenham | Barking and Dagenham | E09000002 | 2005 | 3.00106 | 244.440306 | 137.676935 | 35.95596 | 355.209148 | 197.33216 | 1.239115 | 168.914767 | 1143.769451 |
| 3503 | London | Barking and Dagenham | Barking and Dagenham | E09000002 | 2006 | 2.99693 | 248.377092 | 137.705936 | 33.476519 | 353.608435 | 201.814766 | 1.344134 | 158.345662 | 1137.669474 |
| 3504 | London | Barking and Dagenham | Barking and Dagenham | E09000002 | 2007 | 2.915637 | 253.350311 | 136.537811 | 32.622496 | 342.227095 | 196.00889 | 1.199139 | 164.631799 | 1129.493177 |
| 3505 | London | Barking and Dagenham | Barking and Dagenham | E09000002 | 2008 | 2.821984 | 241.839473 | 131.803592 | 31.038871 | 339.158195 | 174.126655 | 0.907199 | 137.4745 | 1059.170470 |
| 3506 | London | Barking and Dagenham | Barking and Dagenham | E09000002 | 2009 | 2.73088 | 208.603041 | 115.528181 | 27.75836 | 305.796533 | 179.264547 | 0.841917 | 187.319052 | 1027.842511 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 4058 | London | Westminster | Westminster | E09000033 | 2017 | 1.201102 | 166.834438 | 1015.272687 | 247.337188 | 317.290917 | 293.39885 | 4.485572 | 17.170064 | 2062.990819 |
| 4059 | London | Westminster | Westminster | E09000033 | 2018 | 1.205057 | 286.481952 | 680.109173 | 392.924316 | 308.350644 | 282.767489 | 15.027827 | 18.789381 | 1985.655839 |
| 4060 | London | Westminster | Westminster | E09000033 | 2019 | 1.167881 | 248.228947 | 614.555353 | 326.036055 | 291.00727 | 256.091137 | 14.091068 | 17.178562 | 1768.356274 |
| 4061 | London | Westminster | Westminster | E09000033 | 2020 | 1.178764 | 209.62397 | 441.270459 | 279.303695 | 275.398011 | 202.168898 | 14.423154 | 21.581624 | 1444.948575 |
| 4062 | London | Westminster | Westminster | E09000033 | 2021 | 1.15641 | 272.106101 | 547.174354 | 312.312078 | 292.224507 | 213.189986 | 21.398514 | 12.370218 | 1671.932170 |
561 rows × 14 columns
# Create the pivot table of west_midlands local authorities emission
London_local_authority_emissions = pd.pivot_table(London_filtered_columns, values='Grand Total', index=['Local Authority'], columns=['Calendar Year'])
London_local_authority_emissions
| Calendar Year | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | 2021 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Local Authority | |||||||||||||||||
| Barking and Dagenham | 1143.769451 | 1137.669474 | 1129.493177 | 1059.170470 | 1027.842511 | 1074.458911 | 1063.789683 | 989.485100 | 916.863501 | 818.600754 | 751.829766 | 763.987306 | 719.250628 | 727.762858 | 768.496793 | 658.182793 | 713.257071 |
| Barnet | 1935.738379 | 1936.108212 | 1881.730350 | 1895.947916 | 1930.874717 | 1991.839989 | 1803.398317 | 1885.286564 | 1845.016535 | 1648.066549 | 1559.627617 | 1508.335748 | 1465.309243 | 1424.374426 | 1384.449107 | 1259.882102 | 1335.292920 |
| Bexley | 1401.965109 | 1375.118185 | 1317.338848 | 1290.522468 | 1202.145062 | 1308.786002 | 1135.618513 | 1201.956255 | 1154.077529 | 1063.552030 | 1036.689931 | 983.151027 | 929.335144 | 861.388300 | 841.844776 | 773.041608 | 815.969051 |
| Brent | 1658.208154 | 1610.531981 | 1612.339933 | 1676.850623 | 1706.601914 | 1656.682391 | 1461.257729 | 1682.292163 | 1631.283669 | 1363.327488 | 1267.429188 | 1221.079306 | 1204.987810 | 1170.613528 | 1079.773201 | 1016.575515 | 1074.536509 |
| Bromley | 1721.234100 | 1744.069163 | 1655.604912 | 1618.384935 | 1558.309805 | 1658.285845 | 1446.404601 | 1515.517494 | 1467.472313 | 1316.641882 | 1275.347526 | 1211.746038 | 1130.341644 | 1110.521398 | 1050.743698 | 953.423486 | 1015.820513 |
| Camden | 1877.355270 | 1980.016766 | 1922.148658 | 1883.867937 | 1702.861352 | 1755.224626 | 1608.445972 | 1707.674042 | 1639.823685 | 1414.589094 | 1306.655293 | 1221.825268 | 1114.872030 | 1088.164854 | 1008.814953 | 896.919156 | 1024.262763 |
| City of London | 1729.953162 | 1856.477516 | 1746.600671 | 1759.470248 | 1551.279506 | 1711.862311 | 1482.148882 | 1681.313826 | 1490.594281 | 1232.184891 | 1082.514456 | 905.430588 | 800.698851 | 782.393720 | 747.632764 | 627.448608 | 636.123249 |
| Croydon | 1931.756219 | 1885.574789 | 1797.101804 | 1750.064513 | 1660.492021 | 1799.915324 | 1568.043628 | 1674.327417 | 1569.004236 | 1377.384089 | 1324.491777 | 1265.995790 | 1202.474716 | 1189.143001 | 1129.272704 | 1053.903040 | 1133.503357 |
| Ealing | 1951.198985 | 1960.648553 | 1935.712137 | 1950.860111 | 1701.818270 | 1717.703808 | 1614.173621 | 1726.009810 | 1702.771238 | 1524.929566 | 1434.319509 | 1330.810411 | 1290.366345 | 1249.409822 | 1188.201957 | 1068.271237 | 1161.789919 |
| Enfield | 1748.232280 | 1879.598351 | 1880.963693 | 1709.688882 | 1611.718264 | 1675.202836 | 1494.829223 | 1603.918468 | 1524.868584 | 1392.830088 | 1383.876579 | 1324.108205 | 1249.082898 | 1236.464489 | 1183.076303 | 1132.513375 | 1235.139215 |
| Greenwich | 1416.217430 | 1399.930012 | 1352.399745 | 1358.336912 | 1286.963813 | 1367.163039 | 1248.558247 | 1229.056708 | 1141.496846 | 1150.469552 | 1037.972925 | 1003.465419 | 956.880814 | 876.691542 | 833.521963 | 779.530737 | 1005.375305 |
| Hackney | 1015.090396 | 1019.945637 | 997.809376 | 1001.874644 | 933.836421 | 951.613807 | 865.224889 | 955.281777 | 927.882351 | 817.697155 | 776.829266 | 726.695101 | 687.534745 | 658.318386 | 627.945840 | 564.444372 | 603.036087 |
| Hammersmith and Fulham | 1164.287846 | 1182.120138 | 1150.898762 | 1136.048815 | 1097.033177 | 1126.681085 | 1022.357948 | 1066.910579 | 1020.801818 | 886.021370 | 834.117936 | 778.062046 | 727.240760 | 706.517525 | 656.145366 | 584.334299 | 629.409587 |
| Haringey | 1150.764337 | 1151.937818 | 1100.946609 | 1132.659709 | 1023.782785 | 1032.509481 | 935.203949 | 982.412092 | 949.501300 | 843.402558 | 807.872375 | 770.459457 | 722.862852 | 704.308155 | 670.637200 | 623.277807 | 654.153606 |
| Harrow | 1157.228225 | 1134.624330 | 1106.392111 | 1137.399568 | 959.983249 | 1007.040283 | 903.490192 | 958.926033 | 922.261148 | 813.656352 | 781.637668 | 734.580010 | 690.318120 | 681.603271 | 658.230337 | 605.893547 | 619.922133 |
| Havering | 1601.459562 | 1598.010172 | 1544.703518 | 1483.401713 | 1382.402894 | 1416.720720 | 1295.381136 | 1278.794554 | 1258.665992 | 1180.899002 | 1248.646299 | 1140.345852 | 1094.380787 | 1094.881718 | 1098.686036 | 1015.769666 | 1068.894001 |
| Hillingdon | 2548.840600 | 2525.184957 | 2415.811059 | 2560.369310 | 2437.150973 | 2382.919921 | 2135.312249 | 2434.072324 | 2326.560077 | 1950.535081 | 1794.165112 | 1715.418258 | 1647.689809 | 1679.468281 | 1636.343960 | 1333.897331 | 1480.746188 |
| Hounslow | 1772.855740 | 1797.537911 | 1746.773293 | 1711.494746 | 1603.351410 | 1663.511505 | 1540.156144 | 1637.340729 | 1650.378602 | 1396.592922 | 1331.466709 | 1260.571268 | 1183.981985 | 1157.942074 | 1100.808422 | 941.553356 | 1014.384896 |
| Islington | 1299.180237 | 1332.113342 | 1299.171071 | 1325.751925 | 1206.337606 | 1236.015406 | 1099.386756 | 1186.656127 | 1107.362179 | 980.679548 | 902.989868 | 834.085166 | 786.642161 | 747.586416 | 679.912491 | 591.439432 | 647.611949 |
| Kensington and Chelsea | 1460.147803 | 1491.438543 | 1430.694540 | 1437.673702 | 1359.951614 | 1481.545658 | 1383.183063 | 1416.710254 | 1286.186476 | 1155.131148 | 1082.529289 | 963.693486 | 853.190474 | 820.022486 | 771.872211 | 676.400616 | 745.197494 |
| Kingston upon Thames | 954.370986 | 937.979827 | 911.299808 | 883.782973 | 845.347212 | 904.659399 | 788.772809 | 827.614950 | 805.684113 | 720.475253 | 699.599430 | 662.348984 | 627.061547 | 618.774868 | 595.144486 | 537.461326 | 580.307966 |
| Lambeth | 1591.531704 | 1589.885272 | 1533.699620 | 1525.115866 | 1404.945696 | 1487.334689 | 1307.766273 | 1373.170371 | 1338.415324 | 1178.332639 | 1123.105217 | 1057.054317 | 974.528754 | 958.136406 | 935.764679 | 869.575179 | 926.399530 |
| Lewisham | 1312.216548 | 1288.290759 | 1246.585915 | 1185.595845 | 1172.346263 | 1206.629859 | 1023.160788 | 1094.093279 | 1035.845945 | 980.412736 | 1041.195980 | 1071.542548 | 980.777232 | 885.387878 | 882.418783 | 718.296072 | 708.031141 |
| Merton | 1007.609326 | 988.472542 | 940.719068 | 1015.357564 | 963.685229 | 1029.989651 | 894.164062 | 938.417788 | 924.611568 | 829.852980 | 782.882984 | 757.408641 | 718.646925 | 714.676566 | 675.056968 | 614.463470 | 671.736286 |
| Newham | 1744.630226 | 1830.620132 | 1818.882395 | 1768.959537 | 1609.487202 | 1757.408490 | 1709.403699 | 1771.347903 | 1662.731646 | 1415.644378 | 1329.326402 | 1232.633135 | 1174.241991 | 1126.370788 | 1038.987877 | 922.088727 | 965.856167 |
| Redbridge | 1432.459955 | 1405.229768 | 1383.728645 | 1311.253544 | 1105.044707 | 1131.284648 | 1051.800533 | 1095.503833 | 1054.764229 | 957.742555 | 931.022889 | 893.056034 | 845.318100 | 842.031746 | 803.314860 | 740.645195 | 763.714114 |
| Richmond upon Thames | 1201.782909 | 1187.601188 | 1141.738613 | 1136.712776 | 978.363774 | 1032.344108 | 886.296299 | 952.667571 | 925.196965 | 815.919905 | 768.059699 | 727.257056 | 681.405923 | 667.002956 | 630.351139 | 574.260030 | 626.475697 |
| Southwark | 1924.908302 | 1921.816540 | 1867.658966 | 1837.044751 | 1716.097388 | 1822.044174 | 1578.813854 | 1770.222838 | 1647.746486 | 1450.835980 | 1398.612567 | 1285.978929 | 1145.315209 | 1071.835330 | 973.364347 | 834.861444 | 1186.671824 |
| Sutton | 968.323516 | 964.991571 | 931.097147 | 911.486700 | 890.105230 | 983.190574 | 837.676471 | 886.362736 | 849.687163 | 760.515268 | 725.267861 | 687.678530 | 660.063815 | 644.071389 | 612.751743 | 578.803760 | 618.062119 |
| Tower Hamlets | 2158.423511 | 2503.448121 | 2468.663440 | 2508.170331 | 2349.105439 | 2434.962191 | 2181.838489 | 2251.297375 | 2088.404756 | 1761.892820 | 1699.440759 | 1530.286529 | 1360.826182 | 1325.014601 | 1201.178300 | 1029.998327 | 1101.739705 |
| Waltham Forest | 1099.928040 | 1102.047793 | 1061.292407 | 1086.986978 | 1032.952970 | 1062.730408 | 969.089749 | 978.832988 | 965.824070 | 848.895931 | 808.178821 | 785.600762 | 774.148119 | 766.787364 | 760.130735 | 627.090537 | 649.847182 |
| Wandsworth | 1628.392425 | 1583.779240 | 1505.198016 | 1400.135772 | 1350.976697 | 1424.251272 | 1217.326252 | 1284.348498 | 1237.907130 | 1109.845420 | 1060.460740 | 1022.139041 | 948.873104 | 924.443739 | 899.405484 | 826.496690 | 862.963935 |
| Westminster | 3666.384755 | 3821.036803 | 3655.377778 | 3683.201324 | 3319.253557 | 3476.630699 | 3155.744091 | 3367.416556 | 3192.564856 | 2728.160101 | 2479.534026 | 2238.995706 | 2062.990819 | 1985.655839 | 1768.356274 | 1444.948575 | 1671.932170 |
import pandas as pd
# Calculate the yearly total emissions for each year
yearly_totals = London_local_authority_emissions.sum()
# Append the totals as a new row to the DataFrame
London_local_authority_emissions.loc['Total'] = yearly_totals
London_local_authority_emissions
| Calendar Year | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | 2021 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Local Authority | |||||||||||||||||
| Barking and Dagenham | 1143.769451 | 1137.669474 | 1129.493177 | 1059.170470 | 1027.842511 | 1074.458911 | 1063.789683 | 989.485100 | 916.863501 | 818.600754 | 751.829766 | 763.987306 | 719.250628 | 727.762858 | 768.496793 | 658.182793 | 713.257071 |
| Barnet | 1935.738379 | 1936.108212 | 1881.730350 | 1895.947916 | 1930.874717 | 1991.839989 | 1803.398317 | 1885.286564 | 1845.016535 | 1648.066549 | 1559.627617 | 1508.335748 | 1465.309243 | 1424.374426 | 1384.449107 | 1259.882102 | 1335.292920 |
| Bexley | 1401.965109 | 1375.118185 | 1317.338848 | 1290.522468 | 1202.145062 | 1308.786002 | 1135.618513 | 1201.956255 | 1154.077529 | 1063.552030 | 1036.689931 | 983.151027 | 929.335144 | 861.388300 | 841.844776 | 773.041608 | 815.969051 |
| Brent | 1658.208154 | 1610.531981 | 1612.339933 | 1676.850623 | 1706.601914 | 1656.682391 | 1461.257729 | 1682.292163 | 1631.283669 | 1363.327488 | 1267.429188 | 1221.079306 | 1204.987810 | 1170.613528 | 1079.773201 | 1016.575515 | 1074.536509 |
| Bromley | 1721.234100 | 1744.069163 | 1655.604912 | 1618.384935 | 1558.309805 | 1658.285845 | 1446.404601 | 1515.517494 | 1467.472313 | 1316.641882 | 1275.347526 | 1211.746038 | 1130.341644 | 1110.521398 | 1050.743698 | 953.423486 | 1015.820513 |
| Camden | 1877.355270 | 1980.016766 | 1922.148658 | 1883.867937 | 1702.861352 | 1755.224626 | 1608.445972 | 1707.674042 | 1639.823685 | 1414.589094 | 1306.655293 | 1221.825268 | 1114.872030 | 1088.164854 | 1008.814953 | 896.919156 | 1024.262763 |
| City of London | 1729.953162 | 1856.477516 | 1746.600671 | 1759.470248 | 1551.279506 | 1711.862311 | 1482.148882 | 1681.313826 | 1490.594281 | 1232.184891 | 1082.514456 | 905.430588 | 800.698851 | 782.393720 | 747.632764 | 627.448608 | 636.123249 |
| Croydon | 1931.756219 | 1885.574789 | 1797.101804 | 1750.064513 | 1660.492021 | 1799.915324 | 1568.043628 | 1674.327417 | 1569.004236 | 1377.384089 | 1324.491777 | 1265.995790 | 1202.474716 | 1189.143001 | 1129.272704 | 1053.903040 | 1133.503357 |
| Ealing | 1951.198985 | 1960.648553 | 1935.712137 | 1950.860111 | 1701.818270 | 1717.703808 | 1614.173621 | 1726.009810 | 1702.771238 | 1524.929566 | 1434.319509 | 1330.810411 | 1290.366345 | 1249.409822 | 1188.201957 | 1068.271237 | 1161.789919 |
| Enfield | 1748.232280 | 1879.598351 | 1880.963693 | 1709.688882 | 1611.718264 | 1675.202836 | 1494.829223 | 1603.918468 | 1524.868584 | 1392.830088 | 1383.876579 | 1324.108205 | 1249.082898 | 1236.464489 | 1183.076303 | 1132.513375 | 1235.139215 |
| Greenwich | 1416.217430 | 1399.930012 | 1352.399745 | 1358.336912 | 1286.963813 | 1367.163039 | 1248.558247 | 1229.056708 | 1141.496846 | 1150.469552 | 1037.972925 | 1003.465419 | 956.880814 | 876.691542 | 833.521963 | 779.530737 | 1005.375305 |
| Hackney | 1015.090396 | 1019.945637 | 997.809376 | 1001.874644 | 933.836421 | 951.613807 | 865.224889 | 955.281777 | 927.882351 | 817.697155 | 776.829266 | 726.695101 | 687.534745 | 658.318386 | 627.945840 | 564.444372 | 603.036087 |
| Hammersmith and Fulham | 1164.287846 | 1182.120138 | 1150.898762 | 1136.048815 | 1097.033177 | 1126.681085 | 1022.357948 | 1066.910579 | 1020.801818 | 886.021370 | 834.117936 | 778.062046 | 727.240760 | 706.517525 | 656.145366 | 584.334299 | 629.409587 |
| Haringey | 1150.764337 | 1151.937818 | 1100.946609 | 1132.659709 | 1023.782785 | 1032.509481 | 935.203949 | 982.412092 | 949.501300 | 843.402558 | 807.872375 | 770.459457 | 722.862852 | 704.308155 | 670.637200 | 623.277807 | 654.153606 |
| Harrow | 1157.228225 | 1134.624330 | 1106.392111 | 1137.399568 | 959.983249 | 1007.040283 | 903.490192 | 958.926033 | 922.261148 | 813.656352 | 781.637668 | 734.580010 | 690.318120 | 681.603271 | 658.230337 | 605.893547 | 619.922133 |
| Havering | 1601.459562 | 1598.010172 | 1544.703518 | 1483.401713 | 1382.402894 | 1416.720720 | 1295.381136 | 1278.794554 | 1258.665992 | 1180.899002 | 1248.646299 | 1140.345852 | 1094.380787 | 1094.881718 | 1098.686036 | 1015.769666 | 1068.894001 |
| Hillingdon | 2548.840600 | 2525.184957 | 2415.811059 | 2560.369310 | 2437.150973 | 2382.919921 | 2135.312249 | 2434.072324 | 2326.560077 | 1950.535081 | 1794.165112 | 1715.418258 | 1647.689809 | 1679.468281 | 1636.343960 | 1333.897331 | 1480.746188 |
| Hounslow | 1772.855740 | 1797.537911 | 1746.773293 | 1711.494746 | 1603.351410 | 1663.511505 | 1540.156144 | 1637.340729 | 1650.378602 | 1396.592922 | 1331.466709 | 1260.571268 | 1183.981985 | 1157.942074 | 1100.808422 | 941.553356 | 1014.384896 |
| Islington | 1299.180237 | 1332.113342 | 1299.171071 | 1325.751925 | 1206.337606 | 1236.015406 | 1099.386756 | 1186.656127 | 1107.362179 | 980.679548 | 902.989868 | 834.085166 | 786.642161 | 747.586416 | 679.912491 | 591.439432 | 647.611949 |
| Kensington and Chelsea | 1460.147803 | 1491.438543 | 1430.694540 | 1437.673702 | 1359.951614 | 1481.545658 | 1383.183063 | 1416.710254 | 1286.186476 | 1155.131148 | 1082.529289 | 963.693486 | 853.190474 | 820.022486 | 771.872211 | 676.400616 | 745.197494 |
| Kingston upon Thames | 954.370986 | 937.979827 | 911.299808 | 883.782973 | 845.347212 | 904.659399 | 788.772809 | 827.614950 | 805.684113 | 720.475253 | 699.599430 | 662.348984 | 627.061547 | 618.774868 | 595.144486 | 537.461326 | 580.307966 |
| Lambeth | 1591.531704 | 1589.885272 | 1533.699620 | 1525.115866 | 1404.945696 | 1487.334689 | 1307.766273 | 1373.170371 | 1338.415324 | 1178.332639 | 1123.105217 | 1057.054317 | 974.528754 | 958.136406 | 935.764679 | 869.575179 | 926.399530 |
| Lewisham | 1312.216548 | 1288.290759 | 1246.585915 | 1185.595845 | 1172.346263 | 1206.629859 | 1023.160788 | 1094.093279 | 1035.845945 | 980.412736 | 1041.195980 | 1071.542548 | 980.777232 | 885.387878 | 882.418783 | 718.296072 | 708.031141 |
| Merton | 1007.609326 | 988.472542 | 940.719068 | 1015.357564 | 963.685229 | 1029.989651 | 894.164062 | 938.417788 | 924.611568 | 829.852980 | 782.882984 | 757.408641 | 718.646925 | 714.676566 | 675.056968 | 614.463470 | 671.736286 |
| Newham | 1744.630226 | 1830.620132 | 1818.882395 | 1768.959537 | 1609.487202 | 1757.408490 | 1709.403699 | 1771.347903 | 1662.731646 | 1415.644378 | 1329.326402 | 1232.633135 | 1174.241991 | 1126.370788 | 1038.987877 | 922.088727 | 965.856167 |
| Redbridge | 1432.459955 | 1405.229768 | 1383.728645 | 1311.253544 | 1105.044707 | 1131.284648 | 1051.800533 | 1095.503833 | 1054.764229 | 957.742555 | 931.022889 | 893.056034 | 845.318100 | 842.031746 | 803.314860 | 740.645195 | 763.714114 |
| Richmond upon Thames | 1201.782909 | 1187.601188 | 1141.738613 | 1136.712776 | 978.363774 | 1032.344108 | 886.296299 | 952.667571 | 925.196965 | 815.919905 | 768.059699 | 727.257056 | 681.405923 | 667.002956 | 630.351139 | 574.260030 | 626.475697 |
| Southwark | 1924.908302 | 1921.816540 | 1867.658966 | 1837.044751 | 1716.097388 | 1822.044174 | 1578.813854 | 1770.222838 | 1647.746486 | 1450.835980 | 1398.612567 | 1285.978929 | 1145.315209 | 1071.835330 | 973.364347 | 834.861444 | 1186.671824 |
| Sutton | 968.323516 | 964.991571 | 931.097147 | 911.486700 | 890.105230 | 983.190574 | 837.676471 | 886.362736 | 849.687163 | 760.515268 | 725.267861 | 687.678530 | 660.063815 | 644.071389 | 612.751743 | 578.803760 | 618.062119 |
| Tower Hamlets | 2158.423511 | 2503.448121 | 2468.663440 | 2508.170331 | 2349.105439 | 2434.962191 | 2181.838489 | 2251.297375 | 2088.404756 | 1761.892820 | 1699.440759 | 1530.286529 | 1360.826182 | 1325.014601 | 1201.178300 | 1029.998327 | 1101.739705 |
| Waltham Forest | 1099.928040 | 1102.047793 | 1061.292407 | 1086.986978 | 1032.952970 | 1062.730408 | 969.089749 | 978.832988 | 965.824070 | 848.895931 | 808.178821 | 785.600762 | 774.148119 | 766.787364 | 760.130735 | 627.090537 | 649.847182 |
| Wandsworth | 1628.392425 | 1583.779240 | 1505.198016 | 1400.135772 | 1350.976697 | 1424.251272 | 1217.326252 | 1284.348498 | 1237.907130 | 1109.845420 | 1060.460740 | 1022.139041 | 948.873104 | 924.443739 | 899.405484 | 826.496690 | 862.963935 |
| Westminster | 3666.384755 | 3821.036803 | 3655.377778 | 3683.201324 | 3319.253557 | 3476.630699 | 3155.744091 | 3367.416556 | 3192.564856 | 2728.160101 | 2479.534026 | 2238.995706 | 2062.990819 | 1985.655839 | 1768.356274 | 1444.948575 | 1671.932170 |
| Total | 52376.445488 | 53123.855407 | 51490.576089 | 51133.643106 | 47682.448729 | 49769.143112 | 44708.218113 | 47405.239005 | 45262.256614 | 39885.717086 | 37867.696463 | 35615.825963 | 33411.639534 | 32497.765721 | 30892.635757 | 27475.691415 | 29948.163651 |
import matplotlib.pyplot as plt
import pandas as pd
yearly_totals = London_local_authority_emissions.sum()
plt.figure(figsize=(10, 6))
yearly_totals.plot(kind='line', marker='o')
plt.title('Yearly Total Emissions in London')
plt.xlabel('Year')
plt.ylabel('Total Emissions')
plt.grid(True)
plt.show()
Wales
# Filter the data for the "Wales" Region/Country
Wales_data = data_1_1_actual[data_1_1_actual['Region/Country'] == 'Wales']
Wales_data
| 3 | Region/Country | Second Tier Authority | Local Authority | Local Authority Code | Calendar Year | Industry Electricity | Industry Gas | Large Industrial Installations | Industry 'Other' | Industry Total | ... | Agriculture Soils | Agriculture Total | Landfill | Waste Management 'Other' | Waste Management Total | Grand Total | Population ('000s, mid-year estimate) | Per Capita Emissions (tCO2e) | Area (km2) | Emissions per km2 (kt CO2e) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 5899 | Wales | Wales | Blaenau Gwent | W06000019 | 2005 | 65.59808 | 30.923927 | 0.910503 | 31.863069 | 129.295578 | ... | 1.393334 | 9.764239 | 101.06659 | 3.698073 | 104.764663 | 590.397163 | 69.188 | 8.533231 | 108.7279 | 5.430043 |
| 5900 | Wales | Wales | Blaenau Gwent | W06000019 | 2006 | 68.297658 | 37.994917 | 0.882636 | 33.684962 | 140.860173 | ... | 1.312323 | 9.408435 | 94.259989 | 4.270126 | 98.530116 | 594.764649 | 69.61 | 8.544241 | 108.7279 | 5.470212 |
| 5901 | Wales | Wales | Blaenau Gwent | W06000019 | 2007 | 62.6949 | 35.219753 | 0.895169 | 32.121519 | 130.931341 | ... | 1.306047 | 9.717945 | 99.165453 | 4.397067 | 103.56252 | 576.227727 | 69.685 | 8.269035 | 108.7279 | 5.299723 |
| 5902 | Wales | Wales | Blaenau Gwent | W06000019 | 2008 | 66.289964 | 35.527274 | 0.850555 | 30.52934 | 133.197132 | ... | 1.132106 | 9.145378 | 75.383486 | 4.401133 | 79.784619 | 559.320474 | 69.82 | 8.010892 | 108.7279 | 5.144222 |
| 5903 | Wales | Wales | Blaenau Gwent | W06000019 | 2009 | 54.565764 | 24.645645 | 0.868735 | 30.739822 | 110.819967 | ... | 1.198329 | 8.843912 | 66.026656 | 4.155314 | 70.18197 | 494.419940 | 69.85 | 7.07831 | 108.7279 | 4.547314 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 6268 | Wales | Wales | Wrexham | W06000006 | 2017 | 138.291306 | 215.451935 | 6.162685 | 58.279959 | 418.185885 | ... | 24.874715 | 226.456449 | 38.64809 | 13.623222 | 52.271312 | 1296.506128 | 137.024688 | 9.461843 | 503.7739 | 2.573587 |
| 6269 | Wales | Wales | Wrexham | W06000006 | 2018 | 124.951001 | 262.003868 | 6.784681 | 60.623925 | 454.363475 | ... | 24.210341 | 219.62436 | 31.887202 | 14.560996 | 46.448198 | 1269.213023 | 137.113661 | 9.256649 | 503.7739 | 2.51941 |
| 6270 | Wales | Wales | Wrexham | W06000006 | 2019 | 114.083503 | 227.250945 | 10.291972 | 58.155464 | 409.781884 | ... | 23.919956 | 209.605174 | 36.983958 | 17.280628 | 54.264586 | 1206.602551 | 136.859706 | 8.816346 | 503.7739 | 2.395127 |
| 6271 | Wales | Wales | Wrexham | W06000006 | 2020 | 87.7227 | 220.889613 | 8.987525 | 57.01724 | 374.617077 | ... | 22.221266 | 229.931955 | 20.0322 | 16.058294 | 36.090495 | 1102.897206 | 136.391201 | 8.086278 | 503.7739 | 2.18927 |
| 6272 | Wales | Wales | Wrexham | W06000006 | 2021 | 95.762223 | 296.66666 | 9.2927 | 60.24056 | 461.962144 | ... | 23.31838 | 216.993751 | 20.752061 | 13.022652 | 33.774714 | 1207.502681 | 135.132 | 8.935727 | 503.7739 | 2.396914 |
374 rows × 50 columns
# Filter out rows where "Second Tier Authority" ends with "Total"
Wales_data_filtered = Wales_data[~Wales_data['Second Tier Authority'].str.endswith("Total")]
# List of columns to keep
columns_to_keep = ['Region/Country', 'Second Tier Authority', 'Local Authority', 'Local Authority Code', 'Calendar Year', 'LULUCF Net Emissions']
# Add columns that end with "Total"
columns_to_keep.extend([col for col in Wales_data_filtered.columns if col.endswith("Total")])
Wales_filtered_columns = Wales_data_filtered[columns_to_keep]
Wales_filtered_columns.to_csv('Wales_filtered_columns.csv')
Wales_filtered_columns
| 3 | Region/Country | Second Tier Authority | Local Authority | Local Authority Code | Calendar Year | LULUCF Net Emissions | Industry Total | Commercial Total | Public Sector Total | Domestic Total | Transport Total | Agriculture Total | Waste Management Total | Grand Total |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 5899 | Wales | Wales | Blaenau Gwent | W06000019 | 2005 | -0.327826 | 129.295578 | 32.730794 | 18.562459 | 195.049517 | 100.55774 | 9.764239 | 104.764663 | 590.397163 |
| 5900 | Wales | Wales | Blaenau Gwent | W06000019 | 2006 | -0.808447 | 140.860173 | 35.206249 | 20.236768 | 193.346979 | 97.984376 | 9.408435 | 98.530116 | 594.764649 |
| 5901 | Wales | Wales | Blaenau Gwent | W06000019 | 2007 | -0.645698 | 130.931341 | 32.374593 | 18.622168 | 181.66203 | 100.002829 | 9.717945 | 103.56252 | 576.227727 |
| 5902 | Wales | Wales | Blaenau Gwent | W06000019 | 2008 | -0.80085 | 133.197132 | 33.935402 | 18.892512 | 186.674254 | 98.492027 | 9.145378 | 79.784619 | 559.320474 |
| 5903 | Wales | Wales | Blaenau Gwent | W06000019 | 2009 | -1.145458 | 110.819967 | 27.19311 | 14.499577 | 167.845082 | 96.18178 | 8.843912 | 70.18197 | 494.419940 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 6268 | Wales | Wales | Wrexham | W06000006 | 2017 | 30.759367 | 418.185885 | 72.175017 | 25.829278 | 213.93879 | 256.89003 | 226.456449 | 52.271312 | 1296.506128 |
| 6269 | Wales | Wales | Wrexham | W06000006 | 2018 | 29.9354 | 454.363475 | 25.79469 | 23.23909 | 211.829126 | 257.978684 | 219.62436 | 46.448198 | 1269.213023 |
| 6270 | Wales | Wales | Wrexham | W06000006 | 2019 | 29.062392 | 409.781884 | 22.563667 | 22.568689 | 203.705944 | 255.050215 | 209.605174 | 54.264586 | 1206.602551 |
| 6271 | Wales | Wales | Wrexham | W06000006 | 2020 | 27.78471 | 374.617077 | 18.213058 | 19.070547 | 196.043477 | 201.145887 | 229.931955 | 36.090495 | 1102.897206 |
| 6272 | Wales | Wales | Wrexham | W06000006 | 2021 | 26.685561 | 461.962144 | 21.63301 | 20.777525 | 201.075478 | 224.600499 | 216.993751 | 33.774714 | 1207.502681 |
374 rows × 14 columns
# Create the pivot table of west_midlands local authorities emission
Wales_local_authority_emissions = pd.pivot_table(Wales_filtered_columns, values='Grand Total', index=['Local Authority'], columns=['Calendar Year'])
Wales_local_authority_emissions
| Calendar Year | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | 2021 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Local Authority | |||||||||||||||||
| Blaenau Gwent | 590.397163 | 594.764649 | 576.227727 | 559.320474 | 494.419940 | 506.901265 | 428.349653 | 450.003214 | 451.114407 | 389.828472 | 534.622710 | 390.962484 | 557.984657 | 555.495679 | 350.923958 | 326.900103 | 363.187488 |
| Bridgend | 1359.583320 | 1328.278136 | 1283.493003 | 1263.946946 | 1173.890622 | 1176.357606 | 1095.250276 | 1113.884061 | 1103.423135 | 1031.194944 | 984.409370 | 958.793719 | 929.398390 | 939.311405 | 903.755106 | 827.739899 | 864.513347 |
| Caerphilly | 1491.891676 | 1466.628930 | 1403.504827 | 1330.582166 | 1221.059679 | 1253.815468 | 1065.323173 | 1126.135522 | 1120.623853 | 1014.975553 | 953.279500 | 926.438348 | 870.974195 | 880.193592 | 849.067720 | 788.498128 | 813.616539 |
| Cardiff | 3698.847519 | 3411.240239 | 3306.860621 | 3193.118687 | 2969.365021 | 2913.888084 | 2445.316260 | 2624.123187 | 2519.808130 | 2228.403312 | 2107.776243 | 2042.344438 | 1968.407936 | 1871.686889 | 1803.751711 | 1594.449034 | 1788.013176 |
| Carmarthenshire | 2602.104764 | 2570.189236 | 2518.959135 | 2471.906750 | 2266.983363 | 2353.432116 | 2106.291325 | 2160.298425 | 2173.361058 | 2119.475047 | 2010.904061 | 1951.002726 | 1934.073796 | 1922.563658 | 1843.567559 | 1683.576211 | 1768.723320 |
| Ceredigion | 1126.256377 | 1101.837573 | 1058.937582 | 1040.970718 | 997.238606 | 1037.376865 | 979.918183 | 1028.805758 | 1001.344324 | 987.051155 | 960.994072 | 924.869676 | 923.148603 | 917.880254 | 897.652109 | 865.645897 | 892.027474 |
| Conwy | 1235.264422 | 1213.764301 | 1125.155795 | 1089.955240 | 1027.964417 | 1054.812419 | 937.861289 | 982.688234 | 948.853641 | 895.926334 | 881.401204 | 877.671760 | 867.043857 | 845.729893 | 821.736378 | 716.245973 | 782.907595 |
| Denbighshire | 1037.682654 | 1043.320125 | 974.457215 | 958.170492 | 896.630095 | 911.909382 | 828.091079 | 855.911724 | 850.269080 | 806.030525 | 786.327564 | 777.915944 | 763.359577 | 752.154286 | 721.769691 | 659.262986 | 714.669026 |
| Flintshire | 2605.371511 | 2986.886044 | 2860.705377 | 2645.697658 | 2212.047012 | 2138.727258 | 1963.600766 | 2230.682165 | 2211.075206 | 2136.340525 | 2084.301597 | 2053.014452 | 1877.574502 | 1973.767337 | 1882.334109 | 1739.163131 | 1898.892354 |
| Gwynedd | 1624.592759 | 1600.077478 | 1499.865799 | 1387.189173 | 1337.827245 | 1365.531674 | 1234.055245 | 1301.921387 | 1263.954141 | 1303.961508 | 1192.316981 | 1165.501743 | 1137.086200 | 1080.936331 | 1073.336818 | 947.877993 | 1005.856037 |
| Isle of Anglesey | 1117.058819 | 1108.004301 | 1040.619441 | 988.607137 | 885.864240 | 779.281527 | 704.367157 | 719.616464 | 700.504042 | 661.764583 | 638.375923 | 614.055607 | 614.385181 | 606.228957 | 591.154553 | 550.439913 | 592.799558 |
| Merthyr Tydfil | 515.601440 | 513.300545 | 487.687265 | 471.836465 | 424.200746 | 428.794167 | 365.600909 | 381.307059 | 368.808966 | 326.421540 | 323.735918 | 338.528876 | 309.589782 | 302.058578 | 307.320737 | 271.230900 | 274.662125 |
| Monmouthshire | 1215.100255 | 1153.962300 | 1138.193734 | 1141.699561 | 1074.428002 | 1108.142833 | 1039.640713 | 1045.764186 | 1032.016335 | 980.227500 | 937.622798 | 930.270905 | 917.695750 | 899.697814 | 900.226576 | 809.715044 | 852.139214 |
| Neath Port Talbot | 8397.242394 | 8675.849163 | 9185.078825 | 8658.722681 | 6612.024925 | 8956.374552 | 7962.577083 | 6302.730722 | 9165.745383 | 9225.378712 | 8488.614057 | 7564.528681 | 7765.870984 | 6922.494197 | 7449.034806 | 6932.422481 | 7115.435625 |
| Newport | 2548.432964 | 2594.748311 | 2362.596643 | 2198.052882 | 1843.604930 | 1951.700363 | 1677.179751 | 1594.614057 | 1661.401483 | 1578.546766 | 1431.460710 | 1360.483300 | 1198.607769 | 1234.553432 | 1161.241485 | 1037.135400 | 1134.351909 |
| Pembrokeshire | 2156.112223 | 2156.710859 | 2116.327429 | 2053.474018 | 2050.219527 | 2034.014478 | 1900.570278 | 1905.002553 | 1860.192453 | 1776.415908 | 1722.180112 | 1643.235974 | 1615.194492 | 1565.636645 | 1559.635109 | 1450.372099 | 1488.610856 |
| Powys | 2391.303577 | 2367.657499 | 2251.807570 | 2143.534926 | 2055.430330 | 2138.586231 | 1996.586736 | 2075.615066 | 1995.746169 | 1985.609495 | 1959.505504 | 1945.507984 | 1940.037371 | 1862.815152 | 1825.251302 | 1700.858212 | 1746.334340 |
| Rhondda Cynon Taf | 2052.789179 | 2026.559884 | 1959.689670 | 1775.125008 | 1675.182762 | 1753.329228 | 1511.262605 | 1558.142019 | 1605.824721 | 1501.662228 | 1354.173606 | 1305.485364 | 1238.411697 | 1229.624442 | 1186.003849 | 1043.900566 | 1116.542455 |
| Swansea | 2214.178239 | 2206.585074 | 2042.155416 | 1994.768164 | 1804.555497 | 1856.536451 | 1575.467083 | 1626.219109 | 1558.334755 | 1413.087073 | 1350.815169 | 1326.827412 | 1322.683528 | 1273.741738 | 1270.630243 | 1143.012877 | 1179.851424 |
| Torfaen | 757.604676 | 754.329911 | 708.394730 | 730.888745 | 677.235185 | 689.494187 | 624.886068 | 652.483152 | 613.156879 | 552.893975 | 526.712757 | 502.538224 | 478.549203 | 477.045214 | 463.564869 | 411.702835 | 436.240804 |
| Vale of Glamorgan | 1806.882158 | 1744.202955 | 1754.105112 | 1665.188694 | 1485.307101 | 1532.067292 | 1482.858073 | 1476.054129 | 1489.634271 | 1451.955129 | 1402.914051 | 1337.654912 | 1311.235236 | 1295.497727 | 1276.583588 | 1165.223391 | 1206.448627 |
| Wrexham | 2003.171855 | 2026.335823 | 1915.584024 | 1867.223062 | 1741.254914 | 1809.813490 | 1575.432300 | 1625.809205 | 1555.331115 | 1383.680144 | 1352.549024 | 1227.059397 | 1296.506128 | 1269.213023 | 1206.602551 | 1102.897206 | 1207.502681 |
# Calculate the yearly total emissions for each year
yearly_totals = Wales_local_authority_emissions.sum()
# Append the totals as a new row to the DataFrame
Wales_local_authority_emissions.loc['Total'] = yearly_totals
Wales_local_authority_emissions
| Calendar Year | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | 2021 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Local Authority | |||||||||||||||||
| Blaenau Gwent | 590.397163 | 594.764649 | 576.227727 | 559.320474 | 494.419940 | 506.901265 | 428.349653 | 450.003214 | 451.114407 | 389.828472 | 534.622710 | 390.962484 | 557.984657 | 555.495679 | 350.923958 | 326.900103 | 363.187488 |
| Bridgend | 1359.583320 | 1328.278136 | 1283.493003 | 1263.946946 | 1173.890622 | 1176.357606 | 1095.250276 | 1113.884061 | 1103.423135 | 1031.194944 | 984.409370 | 958.793719 | 929.398390 | 939.311405 | 903.755106 | 827.739899 | 864.513347 |
| Caerphilly | 1491.891676 | 1466.628930 | 1403.504827 | 1330.582166 | 1221.059679 | 1253.815468 | 1065.323173 | 1126.135522 | 1120.623853 | 1014.975553 | 953.279500 | 926.438348 | 870.974195 | 880.193592 | 849.067720 | 788.498128 | 813.616539 |
| Cardiff | 3698.847519 | 3411.240239 | 3306.860621 | 3193.118687 | 2969.365021 | 2913.888084 | 2445.316260 | 2624.123187 | 2519.808130 | 2228.403312 | 2107.776243 | 2042.344438 | 1968.407936 | 1871.686889 | 1803.751711 | 1594.449034 | 1788.013176 |
| Carmarthenshire | 2602.104764 | 2570.189236 | 2518.959135 | 2471.906750 | 2266.983363 | 2353.432116 | 2106.291325 | 2160.298425 | 2173.361058 | 2119.475047 | 2010.904061 | 1951.002726 | 1934.073796 | 1922.563658 | 1843.567559 | 1683.576211 | 1768.723320 |
| Ceredigion | 1126.256377 | 1101.837573 | 1058.937582 | 1040.970718 | 997.238606 | 1037.376865 | 979.918183 | 1028.805758 | 1001.344324 | 987.051155 | 960.994072 | 924.869676 | 923.148603 | 917.880254 | 897.652109 | 865.645897 | 892.027474 |
| Conwy | 1235.264422 | 1213.764301 | 1125.155795 | 1089.955240 | 1027.964417 | 1054.812419 | 937.861289 | 982.688234 | 948.853641 | 895.926334 | 881.401204 | 877.671760 | 867.043857 | 845.729893 | 821.736378 | 716.245973 | 782.907595 |
| Denbighshire | 1037.682654 | 1043.320125 | 974.457215 | 958.170492 | 896.630095 | 911.909382 | 828.091079 | 855.911724 | 850.269080 | 806.030525 | 786.327564 | 777.915944 | 763.359577 | 752.154286 | 721.769691 | 659.262986 | 714.669026 |
| Flintshire | 2605.371511 | 2986.886044 | 2860.705377 | 2645.697658 | 2212.047012 | 2138.727258 | 1963.600766 | 2230.682165 | 2211.075206 | 2136.340525 | 2084.301597 | 2053.014452 | 1877.574502 | 1973.767337 | 1882.334109 | 1739.163131 | 1898.892354 |
| Gwynedd | 1624.592759 | 1600.077478 | 1499.865799 | 1387.189173 | 1337.827245 | 1365.531674 | 1234.055245 | 1301.921387 | 1263.954141 | 1303.961508 | 1192.316981 | 1165.501743 | 1137.086200 | 1080.936331 | 1073.336818 | 947.877993 | 1005.856037 |
| Isle of Anglesey | 1117.058819 | 1108.004301 | 1040.619441 | 988.607137 | 885.864240 | 779.281527 | 704.367157 | 719.616464 | 700.504042 | 661.764583 | 638.375923 | 614.055607 | 614.385181 | 606.228957 | 591.154553 | 550.439913 | 592.799558 |
| Merthyr Tydfil | 515.601440 | 513.300545 | 487.687265 | 471.836465 | 424.200746 | 428.794167 | 365.600909 | 381.307059 | 368.808966 | 326.421540 | 323.735918 | 338.528876 | 309.589782 | 302.058578 | 307.320737 | 271.230900 | 274.662125 |
| Monmouthshire | 1215.100255 | 1153.962300 | 1138.193734 | 1141.699561 | 1074.428002 | 1108.142833 | 1039.640713 | 1045.764186 | 1032.016335 | 980.227500 | 937.622798 | 930.270905 | 917.695750 | 899.697814 | 900.226576 | 809.715044 | 852.139214 |
| Neath Port Talbot | 8397.242394 | 8675.849163 | 9185.078825 | 8658.722681 | 6612.024925 | 8956.374552 | 7962.577083 | 6302.730722 | 9165.745383 | 9225.378712 | 8488.614057 | 7564.528681 | 7765.870984 | 6922.494197 | 7449.034806 | 6932.422481 | 7115.435625 |
| Newport | 2548.432964 | 2594.748311 | 2362.596643 | 2198.052882 | 1843.604930 | 1951.700363 | 1677.179751 | 1594.614057 | 1661.401483 | 1578.546766 | 1431.460710 | 1360.483300 | 1198.607769 | 1234.553432 | 1161.241485 | 1037.135400 | 1134.351909 |
| Pembrokeshire | 2156.112223 | 2156.710859 | 2116.327429 | 2053.474018 | 2050.219527 | 2034.014478 | 1900.570278 | 1905.002553 | 1860.192453 | 1776.415908 | 1722.180112 | 1643.235974 | 1615.194492 | 1565.636645 | 1559.635109 | 1450.372099 | 1488.610856 |
| Powys | 2391.303577 | 2367.657499 | 2251.807570 | 2143.534926 | 2055.430330 | 2138.586231 | 1996.586736 | 2075.615066 | 1995.746169 | 1985.609495 | 1959.505504 | 1945.507984 | 1940.037371 | 1862.815152 | 1825.251302 | 1700.858212 | 1746.334340 |
| Rhondda Cynon Taf | 2052.789179 | 2026.559884 | 1959.689670 | 1775.125008 | 1675.182762 | 1753.329228 | 1511.262605 | 1558.142019 | 1605.824721 | 1501.662228 | 1354.173606 | 1305.485364 | 1238.411697 | 1229.624442 | 1186.003849 | 1043.900566 | 1116.542455 |
| Swansea | 2214.178239 | 2206.585074 | 2042.155416 | 1994.768164 | 1804.555497 | 1856.536451 | 1575.467083 | 1626.219109 | 1558.334755 | 1413.087073 | 1350.815169 | 1326.827412 | 1322.683528 | 1273.741738 | 1270.630243 | 1143.012877 | 1179.851424 |
| Torfaen | 757.604676 | 754.329911 | 708.394730 | 730.888745 | 677.235185 | 689.494187 | 624.886068 | 652.483152 | 613.156879 | 552.893975 | 526.712757 | 502.538224 | 478.549203 | 477.045214 | 463.564869 | 411.702835 | 436.240804 |
| Vale of Glamorgan | 1806.882158 | 1744.202955 | 1754.105112 | 1665.188694 | 1485.307101 | 1532.067292 | 1482.858073 | 1476.054129 | 1489.634271 | 1451.955129 | 1402.914051 | 1337.654912 | 1311.235236 | 1295.497727 | 1276.583588 | 1165.223391 | 1206.448627 |
| Wrexham | 2003.171855 | 2026.335823 | 1915.584024 | 1867.223062 | 1741.254914 | 1809.813490 | 1575.432300 | 1625.809205 | 1555.331115 | 1383.680144 | 1352.549024 | 1227.059397 | 1296.506128 | 1269.213023 | 1206.602551 | 1102.897206 | 1207.502681 |
| Total | 44547.469945 | 44645.233337 | 43570.406943 | 41629.979644 | 36926.734158 | 39750.886937 | 35500.486005 | 34837.811399 | 37250.523547 | 35750.830428 | 33984.992931 | 32164.691926 | 31837.818836 | 30678.326243 | 30345.144827 | 27768.270278 | 29243.325974 |
import matplotlib.pyplot as plt
import pandas as pd
yearly_totals = Wales_local_authority_emissions.sum()
plt.figure(figsize=(10, 6))
yearly_totals.plot(kind='line', marker='o')
plt.title('Yearly Total Emissions in Wales')
plt.xlabel('Year')
plt.ylabel('Total Emissions')
plt.grid(True)
plt.show()
yearly_totals
Calendar Year 2005 44547.469945 2006 44645.233337 2007 43570.406943 2008 41629.979644 2009 36926.734158 2010 39750.886937 2011 35500.486005 2012 34837.811399 2013 37250.523547 2014 35750.830428 2015 33984.992931 2016 32164.691926 2017 31837.818836 2018 30678.326243 2019 30345.144827 2020 27768.270278 2021 29243.325974 dtype: float64
Per capita emissions refer to the average release of emissions (carbon dioxide, CO2) attributed to each individual within a certain population.
The term "Average Emissions per km² by Region" refers to a measure used to understand the density of emissions in a given area. This metric provides an average value of emissions (carbon dioxide, CO2) distributed over the total land area of a region, expressed in square kilometers (km²)
# Compute average per capita emissions
data_1_1_actual['Per Capita Emissions'] = pd.to_numeric(data_1_1_actual['Grand Total'], errors='coerce') / \
pd.to_numeric(data_1_1_actual["Population ('000s, mid-year estimate)"], errors='coerce')
# Compute average emissions per km^2
data_1_1_actual['Emissions per km^2'] = pd.to_numeric(data_1_1_actual['Grand Total'], errors='coerce') / \
pd.to_numeric(data_1_1_actual['Area (km2)'], errors='coerce')
# Group by 'Region/Country' to get average per capita emissions and emissions per km^2 for each region
average_emissions = data_1_1_actual.groupby('Region/Country')[['Per Capita Emissions', 'Emissions per km^2']].mean()
average_emissions
| 3 | Per Capita Emissions | Emissions per km^2 |
|---|---|---|
| Region/Country | ||
| East Midlands | 9.710848 | 5.318442 |
| East Midlands Total | 8.668612 | 2.514725 |
| East of England | 8.091197 | 5.769531 |
| East of England Total | 7.929507 | 2.398958 |
| England Total | 7.523039 | 3.022770 |
| London | 10.318067 | 47.155219 |
| London Total | 5.113713 | 26.209921 |
| National Total | 8.107857 | 2.072628 |
| North East | 11.400630 | 11.222972 |
| North East Total | 10.240767 | 3.060355 |
| North West | 9.260969 | 7.706417 |
| North West Total | 7.965210 | 3.794200 |
| Northern Ireland | 13.582651 | 3.095697 |
| Northern Ireland Total | 12.634336 | 1.605084 |
| Scotland | 11.779920 | 3.426134 |
| Scotland Total | 9.553573 | 0.630670 |
| South East | 6.804080 | 6.286119 |
| South East Total | 6.775691 | 3.038176 |
| South West | 8.215306 | 3.880700 |
| South West Total | 7.664305 | 1.685468 |
| Unallocated | NaN | NaN |
| Wales | 12.037397 | 4.060587 |
| Wales Total | 11.743721 | 1.691700 |
| West Midlands | 8.623030 | 7.402992 |
| West Midlands Total | 7.391437 | 3.215261 |
| Yorkshire and the Humber | 11.376461 | 5.033133 |
| Yorkshire and the Humber Total | 9.092107 | 3.103146 |
import pandas as pd
import numpy as np
from sklearn.manifold import TSNE
import seaborn as sns
import matplotlib.pyplot as plt
# Load the dataset
data_transposed = west_midlands_local_authority_emissions
# Select numerical columns and exclude non-numerical data
numerical_data = data_transposed.select_dtypes(include=[np.number])
# Handling missing values by filling them with the mean of each column
numerical_data_filled = numerical_data.fillna(numerical_data.mean())
# Sampling a subset of the data to make the computation more manageable
sampled_data = numerical_data_filled.sample(frac=0.01, random_state=0) # Adjust 'frac' to change the sample size
# t-SNE transformation
tsne = TSNE(n_components=2, random_state=0)
tsne_results = tsne.fit_transform(sampled_data)
# Creating a DataFrame for the t-SNE results
tsne_df = pd.DataFrame({
'TSNE1': tsne_results[:, 0],
'TSNE2': tsne_results[:, 1]
})
# Plotting the t-SNE results
plt.figure(figsize=(12, 8))
sns.scatterplot(x='TSNE1', y='TSNE2', color='blue', edgecolor='black', data=tsne_df)
plt.title('t-SNE visualization of sampled data')
plt.xlabel('TSNE Component 1')
plt.ylabel('TSNE Component 2')
plt.grid(True)
plt.show()
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) Cell In[112], line 27 25 # t-SNE transformation 26 tsne = TSNE(n_components=2, random_state=0) ---> 27 tsne_results = tsne.fit_transform(sampled_data) 29 # Creating a DataFrame for the t-SNE results 30 tsne_df = pd.DataFrame({ 31 'TSNE1': tsne_results[:, 0], 32 'TSNE2': tsne_results[:, 1] 33 }) File C:\ProgramData\anaconda3\lib\site-packages\sklearn\manifold\_t_sne.py:1118, in TSNE.fit_transform(self, X, y) 1097 """Fit X into an embedded space and return that transformed output. 1098 1099 Parameters (...) 1115 Embedding of the training data in low-dimensional space. 1116 """ 1117 self._validate_params() -> 1118 self._check_params_vs_input(X) 1119 embedding = self._fit(X) 1120 self.embedding_ = embedding File C:\ProgramData\anaconda3\lib\site-packages\sklearn\manifold\_t_sne.py:829, in TSNE._check_params_vs_input(self, X) 827 def _check_params_vs_input(self, X): 828 if self.perplexity >= X.shape[0]: --> 829 raise ValueError("perplexity must be less than n_samples") ValueError: perplexity must be less than n_samples
data_transposed
| Calendar Year | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | 2021 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Local Authority | |||||||||||||||||
| Birmingham | 7059.334372 | 6952.112034 | 6800.522322 | 6744.677640 | 6099.737055 | 6412.804284 | 5805.907414 | 6094.847957 | 5928.649539 | 5283.528982 | 5187.812112 | 5018.990855 | 4952.980766 | 4694.297182 | 4500.378294 | 4177.186378 | 4480.654084 |
| Bromsgrove | 956.642291 | 966.027098 | 1014.325992 | 966.508457 | 915.239428 | 943.043254 | 865.875640 | 885.439318 | 875.957613 | 847.917226 | 839.358149 | 828.019311 | 799.333806 | 765.570975 | 756.273104 | 655.139864 | 711.760256 |
| Cannock Chase | 588.979929 | 597.874966 | 584.212829 | 554.965633 | 534.431424 | 565.871617 | 516.643905 | 523.651492 | 521.293200 | 463.806795 | 448.224377 | 432.415392 | 438.056787 | 428.182712 | 400.378294 | 360.878619 | 364.042820 |
| Coventry | 2285.257172 | 2244.459643 | 2109.501041 | 2028.053404 | 1835.162506 | 1962.611578 | 1766.002096 | 1881.254900 | 1815.756821 | 1664.671871 | 1647.571233 | 1574.337656 | 1548.946359 | 1496.667926 | 1428.701726 | 1280.312165 | 1344.000349 |
| Dudley | 1983.590348 | 1972.154700 | 1888.505634 | 1818.345968 | 1657.763334 | 1778.665829 | 1603.028674 | 1645.915361 | 1619.695684 | 1452.515600 | 1412.477913 | 1356.141083 | 1347.120418 | 1295.213550 | 1217.885198 | 1079.756656 | 1163.033149 |
| East Staffordshire | 1287.270750 | 1298.775799 | 1260.546128 | 1245.328814 | 1170.588639 | 1228.732489 | 1160.773439 | 1165.877455 | 1143.246946 | 1050.973296 | 1014.291935 | 988.598271 | 981.240089 | 966.664142 | 903.141998 | 810.544121 | 853.521292 |
| Herefordshire, County of | 2091.684474 | 2090.207781 | 2038.853224 | 1975.986634 | 1872.500149 | 1994.484316 | 1849.438619 | 1892.331768 | 1847.560317 | 1777.587181 | 1696.464807 | 1660.690961 | 1618.084774 | 1598.639996 | 1527.132610 | 1364.626064 | 1472.865799 |
| Lichfield | 1004.513597 | 1022.789515 | 1013.242837 | 993.374843 | 955.231477 | 981.016859 | 931.692962 | 938.137617 | 935.001562 | 875.810423 | 863.532062 | 850.456232 | 851.408592 | 837.842706 | 813.911260 | 691.404937 | 754.890754 |
| Malvern Hills | 772.773159 | 799.621230 | 848.442583 | 797.397569 | 757.938462 | 782.645008 | 723.285139 | 726.094378 | 717.922272 | 695.654404 | 688.580938 | 676.822279 | 655.588099 | 642.098370 | 619.590085 | 538.434311 | 601.444378 |
| Newcastle-under-Lyme | 1121.822896 | 1113.654899 | 1107.267011 | 1087.895669 | 1042.573930 | 1083.119199 | 1024.749999 | 1024.635624 | 1021.924809 | 939.404697 | 969.297348 | 944.169090 | 929.997916 | 911.784056 | 875.110449 | 784.361909 | 844.190037 |
| North Warwickshire | 1147.478903 | 1203.041332 | 1226.193958 | 1194.492043 | 1121.940224 | 1191.290419 | 1085.180037 | 1110.955808 | 1077.930006 | 1065.117153 | 1060.260000 | 1029.729619 | 1025.527236 | 1001.285910 | 961.051901 | 823.867578 | 919.630410 |
| Nuneaton and Bedworth | 813.097075 | 815.719471 | 861.239859 | 828.813779 | 772.443801 | 765.017013 | 677.858160 | 704.027486 | 684.424974 | 638.996823 | 634.593829 | 606.356718 | 597.691045 | 593.546030 | 563.796985 | 505.938444 | 540.854821 |
| Redditch | 585.906245 | 591.548822 | 632.697674 | 593.920918 | 533.689582 | 577.553981 | 522.724477 | 528.282173 | 500.489462 | 468.059313 | 444.165840 | 421.000712 | 406.630410 | 394.836203 | 383.634496 | 332.680616 | 357.787785 |
| Rugby | 2330.213067 | 2373.288078 | 2651.893434 | 2411.372020 | 2278.320929 | 2283.729045 | 2334.076177 | 2124.141199 | 2157.994109 | 2133.777044 | 1993.474564 | 2109.935757 | 2000.143439 | 2041.847091 | 1988.209932 | 1853.518649 | 2026.454350 |
| Sandwell | 2289.664199 | 2311.288395 | 2271.410144 | 2149.609456 | 1915.683984 | 2035.995284 | 1862.670040 | 1942.089900 | 1917.089725 | 1748.667558 | 1785.651958 | 1577.895063 | 1587.577139 | 1534.233142 | 1453.591967 | 1315.105902 | 1401.088684 |
| Shropshire | 3788.039158 | 3772.811560 | 3701.754274 | 3605.369762 | 3330.136985 | 3537.762726 | 3277.426402 | 3379.256702 | 3297.401331 | 3129.205145 | 3092.862487 | 3029.587506 | 2980.901584 | 2986.369707 | 2882.366244 | 2645.075805 | 2773.370585 |
| Solihull | 1735.803608 | 1801.630633 | 1757.701820 | 1667.849693 | 1565.457992 | 1683.199281 | 1588.399097 | 1667.768397 | 1627.491397 | 1492.473058 | 1518.294020 | 1475.326158 | 1427.841084 | 1410.483026 | 1327.805102 | 1145.680549 | 1226.181599 |
| South Staffordshire | 1142.941882 | 1174.670664 | 1167.283617 | 1104.231784 | 1101.010671 | 1130.681369 | 1083.159234 | 1088.067517 | 1082.128759 | 1028.457756 | 1000.357801 | 1002.784932 | 1034.087269 | 1000.812994 | 973.519842 | 870.868301 | 954.939263 |
| Stafford | 1593.609395 | 1616.541260 | 1569.472252 | 1566.564484 | 1533.426323 | 1571.727852 | 1488.504814 | 1541.704997 | 1510.290365 | 1422.200740 | 1409.541939 | 1378.918704 | 1343.494927 | 1325.675278 | 1270.246497 | 1101.396010 | 1166.734395 |
| Staffordshire Moorlands | 1776.923545 | 1766.082186 | 1742.557987 | 1656.608649 | 1559.741550 | 1655.713572 | 1603.156284 | 1544.428454 | 1564.011891 | 1546.833793 | 1525.699941 | 1497.001650 | 1413.357126 | 1475.730299 | 1386.199936 | 1274.232906 | 1332.991944 |
| Stoke-on-Trent | 2041.728131 | 2025.362057 | 1977.856972 | 1900.940420 | 1777.333094 | 1764.352853 | 1581.980925 | 1584.479240 | 1577.177978 | 1606.524597 | 1550.265296 | 1507.331663 | 1261.650147 | 1223.080833 | 1153.929946 | 1020.164468 | 1127.356084 |
| Stratford-on-Avon | 1501.480094 | 1513.735452 | 1541.393414 | 1488.757386 | 1386.832211 | 1436.903217 | 1352.934775 | 1446.126565 | 1414.636678 | 1431.627278 | 1322.595288 | 1324.473512 | 1317.037715 | 1314.087715 | 1223.348640 | 1042.939059 | 1120.139928 |
| Tamworth | 475.068557 | 474.697517 | 448.760723 | 436.457918 | 421.964679 | 440.200738 | 404.836659 | 414.618735 | 406.701043 | 363.694164 | 347.631884 | 326.917746 | 317.951130 | 318.231567 | 290.820254 | 254.020488 | 268.838288 |
| Telford and Wrekin | 1659.122170 | 1685.979582 | 1682.256128 | 1530.383297 | 1419.460432 | 1502.746044 | 1426.316317 | 1494.792938 | 1385.950669 | 1274.826827 | 1254.843761 | 1160.254419 | 1154.173983 | 1183.265359 | 1119.790339 | 952.853681 | 1002.876199 |
| Walsall | 1841.638129 | 1829.666727 | 1772.033335 | 1673.938533 | 1488.754727 | 1624.627863 | 1525.409270 | 1549.103445 | 1531.029950 | 1380.643572 | 1300.406164 | 1250.226818 | 1242.130066 | 1201.211275 | 1119.598232 | 1009.154672 | 1088.510816 |
| Warwick | 1391.720595 | 1400.843990 | 1409.130871 | 1362.768363 | 1273.867598 | 1277.589915 | 1173.237382 | 1216.060961 | 1177.002231 | 1071.245807 | 1097.625980 | 1065.827135 | 1033.018272 | 1010.087886 | 964.526191 | 797.752307 | 892.125608 |
| Wolverhampton | 1677.179103 | 1629.700831 | 1587.437458 | 1536.582527 | 1398.213887 | 1493.486284 | 1370.306537 | 1402.082164 | 1367.714396 | 1232.190047 | 1196.524784 | 1135.863036 | 1090.199900 | 1060.553002 | 989.441812 | 917.369776 | 985.990590 |
| Worcester | 643.542399 | 616.473972 | 657.726991 | 608.166828 | 572.476104 | 601.253296 | 557.011139 | 575.389895 | 551.444982 | 496.471837 | 475.843472 | 447.628667 | 420.763103 | 406.786384 | 381.230693 | 344.888325 | 375.993686 |
| Wychavon | 1447.518551 | 1504.532587 | 1542.140801 | 1467.778761 | 1367.405328 | 1425.701966 | 1300.782922 | 1309.649052 | 1311.106432 | 1260.514044 | 1248.715685 | 1232.972760 | 1205.932168 | 1196.208234 | 1145.581204 | 1013.927972 | 1148.046126 |
| Wyre Forest | 697.678624 | 699.747403 | 750.464334 | 692.233167 | 628.671292 | 664.291809 | 616.665147 | 630.668567 | 612.868789 | 570.347427 | 543.650712 | 522.982812 | 499.390383 | 489.710354 | 458.909804 | 408.559610 | 436.541231 |
data_transposed.dtypes
Calendar Year 2005 float64 2006 float64 2007 float64 2008 float64 2009 float64 2010 float64 2011 float64 2012 float64 2013 float64 2014 float64 2015 float64 2016 float64 2017 float64 2018 float64 2019 float64 2020 float64 2021 float64 dtype: object
scatter
matplotlib.pyplot.scatter(x, y, s=None, c=None, marker=None, cmap=None, norm=None, vmin=None, vmax=None, alpha=None, linewidths=None, *, edgecolors=None, plotnonfinite=False, data=None, **kwargs)
import matplotlib.pypl
matplotlib.pyplot.scatter(data_transposed)
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[117], line 1 ----> 1 matplotlib.pyplot.scatter(data_transposed) NameError: name 'matplotlib' is not defined
import pandas as pd
import numpy as np
from sklearn.manifold import TSNE
import seaborn as sns
import matplotlib.pyplot as plt
# Load the dataset
# Select numerical columns and exclude non-numerical data
numerical_data = data.select_dtypes(include=[np.number])
# Handling missing values by filling them with the mean of each column
numerical_data_filled = numerical_data.fillna(numerical_data.mean())
# Sampling a subset of the data to make the computation more manageable
sampled_data = numerical_data_filled.sample(frac=0.1, random_state=0) # Adjust 'frac' to change the sample size
# t-SNE transformation
tsne = TSNE(n_components=2, random_state=0)
tsne_results = tsne.fit_transform(sampled_data)
# Creating a DataFrame for the t-SNE results
tsne_df = pd.DataFrame({
'TSNE1': tsne_results[:, 0],
'TSNE2': tsne_results[:, 1]
})
# Plotting the t-SNE results
plt.figure(figsize=(12, 8))
sns.scatterplot(x='TSNE1', y='TSNE2', color='blue', edgecolor='black', data=tsne_df)
plt.title('t-SNE visualization of sampled data')
plt.xlabel('TSNE Component 1')
plt.ylabel('TSNE Component 2')
plt.grid(True)
plt.show()
numerical_data
| Calendar Year | Commercial 'Other' | Net Emissions: Indirect N2O | Agriculture Gas | Per Capita Emissions (tCO2e) | Emissions per km2 (kt CO2e) | |
|---|---|---|---|---|---|---|
| 0 | 2005.0 | 0.6 | 0.2 | 0.4 | 9.7 | 4.9 |
| 1 | 2006.0 | 0.6 | 0.2 | 0.3 | 9.3 | 4.8 |
| 2 | 2007.0 | 0.6 | 0.2 | 0.3 | 9.0 | 4.7 |
| 3 | 2008.0 | 0.7 | 0.1 | 0.3 | 8.4 | 4.4 |
| 4 | 2009.0 | 0.6 | 0.1 | 0.3 | 7.7 | 4.1 |
| ... | ... | ... | ... | ... | ... | ... |
| 7103 | 2019.0 | 347.0 | 152.4 | 813.6 | 6.2 | 1.7 |
| 7104 | 2020.0 | 191.7 | 152.8 | 923.3 | 5.6 | 1.5 |
| 7105 | 2021.0 | 223.1 | 152.3 | 903.4 | 6.0 | 1.6 |
| 7106 | NaN | NaN | NaN | NaN | NaN | NaN |
| 7107 | NaN | NaN | NaN | NaN | NaN | NaN |
7108 rows × 6 columns
tsne_df
| TSNE1 | TSNE2 | |
|---|---|---|
| 0 | -23.196171 | 34.333500 |
| 1 | 15.931448 | -3.052190 |
| 2 | -9.544743 | -12.004598 |
| 3 | 17.712883 | -12.668040 |
| 4 | 23.051466 | -1.983216 |
| ... | ... | ... |
| 706 | 1.385328 | -5.390362 |
| 707 | -20.086830 | 15.288509 |
| 708 | -13.126957 | -15.810575 |
| 709 | 25.429058 | 1.221480 |
| 710 | -11.236401 | -8.841590 |
711 rows × 2 columns
import pandas as pd
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
import seaborn as sns
tsne_results = tsne_df
# Extract the t-SNE components
tsne_component_1 = tsne_results['TSNE1']
tsne_component_2 = tsne_results['TSNE2']
# Perform KMeans clustering
# The number of clusters is set to 5
kmeans = KMeans(n_clusters=5, random_state=0)
tsne_results['Cluster'] = kmeans.fit_predict(tsne_results[['TSNE1', 'TSNE2']])
# Plotting the t-SNE results with different colors for each cluster
plt.figure(figsize=(12, 8))
sns.scatterplot(x='TSNE1', y='TSNE2', hue='Cluster', palette='viridis', data=tsne_results, legend='full')
# Title and labels
plt.title('t-SNE visualization with Cluster Colors')
plt.xlabel('TSNE Component 1')
plt.ylabel('TSNE Component 2')
plt.legend(title='Cluster')
plt.grid(True)
plt.show()
C:\ProgramData\anaconda3\lib\site-packages\sklearn\cluster\_kmeans.py:870: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning warnings.warn( C:\ProgramData\anaconda3\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=3. warnings.warn(
tsne_df
| TSNE1 | TSNE2 | Cluster | |
|---|---|---|---|
| 0 | -23.196171 | 34.333500 | 4 |
| 1 | 15.931448 | -3.052190 | 3 |
| 2 | -9.544743 | -12.004598 | 2 |
| 3 | 17.712883 | -12.668040 | 3 |
| 4 | 23.051466 | -1.983216 | 3 |
| ... | ... | ... | ... |
| 706 | 1.385328 | -5.390362 | 2 |
| 707 | -20.086830 | 15.288509 | 0 |
| 708 | -13.126957 | -15.810575 | 2 |
| 709 | 25.429058 | 1.221480 | 3 |
| 710 | -11.236401 | -8.841590 | 2 |
711 rows × 3 columns
tsne_df.to_csv('tsne.csv', index=False)
# Sort the regions based on average per capita emissions
sorted_per_capita = average_emissions['Per Capita Emissions'].sort_values(ascending=False)
# Plotting average per capita emissions for each region
plt.figure(figsize=(14, 10))
sorted_per_capita.plot(kind='bar', color='lightblue')
plt.title('Average Per Capita Emissions by Region')
plt.ylabel('Average Per Capita Emissions (kt CO2e)')
plt.xlabel('Region/Country')
plt.xticks(rotation=90)
plt.grid(axis='y')
plt.tight_layout()
plt.show()
import matplotlib.pyplot as plt
import pandas as pd
# Filter out rows where 'Region/Country' ends with 'Total'
filtered_average_emissions = average_emissions[~average_emissions['Region/Country'].str.endswith('Total')]
# Sort the regions based on average per capita emissions
sorted_per_capita = filtered_average_emissions['Per Capita Emissions'].sort_values(ascending=False)
# Plotting average per capita emissions for each region
plt.figure(figsize=(14, 10))
sorted_per_capita.plot(kind='bar', color='lightblue')
plt.title('Average Per Capita Emissions by Region (Excluding Totals)')
plt.ylabel('Average Per Capita Emissions (kt CO2e)')
plt.xlabel('Region/Country')
plt.xticks(rotation=90)
plt.grid(axis='y')
plt.tight_layout()
plt.show()
--------------------------------------------------------------------------- KeyError Traceback (most recent call last) File C:\ProgramData\anaconda3\lib\site-packages\pandas\core\indexes\base.py:3802, in Index.get_loc(self, key, method, tolerance) 3801 try: -> 3802 return self._engine.get_loc(casted_key) 3803 except KeyError as err: File C:\ProgramData\anaconda3\lib\site-packages\pandas\_libs\index.pyx:138, in pandas._libs.index.IndexEngine.get_loc() File C:\ProgramData\anaconda3\lib\site-packages\pandas\_libs\index.pyx:165, in pandas._libs.index.IndexEngine.get_loc() File pandas\_libs\hashtable_class_helper.pxi:5745, in pandas._libs.hashtable.PyObjectHashTable.get_item() File pandas\_libs\hashtable_class_helper.pxi:5753, in pandas._libs.hashtable.PyObjectHashTable.get_item() KeyError: 'Region/Country' The above exception was the direct cause of the following exception: KeyError Traceback (most recent call last) Cell In[82], line 5 2 import pandas as pd 4 # Filter out rows where 'Region/Country' ends with 'Total' ----> 5 filtered_average_emissions = average_emissions[~average_emissions['Region/Country'].str.endswith('Total')] 7 # Sort the regions based on average per capita emissions 8 sorted_per_capita = filtered_average_emissions['Per Capita Emissions'].sort_values(ascending=False) File C:\ProgramData\anaconda3\lib\site-packages\pandas\core\frame.py:3807, in DataFrame.__getitem__(self, key) 3805 if self.columns.nlevels > 1: 3806 return self._getitem_multilevel(key) -> 3807 indexer = self.columns.get_loc(key) 3808 if is_integer(indexer): 3809 indexer = [indexer] File C:\ProgramData\anaconda3\lib\site-packages\pandas\core\indexes\base.py:3804, in Index.get_loc(self, key, method, tolerance) 3802 return self._engine.get_loc(casted_key) 3803 except KeyError as err: -> 3804 raise KeyError(key) from err 3805 except TypeError: 3806 # If we have a listlike key, _check_indexing_error will raise 3807 # InvalidIndexError. Otherwise we fall through and re-raise 3808 # the TypeError. 3809 self._check_indexing_error(key) KeyError: 'Region/Country'
Per capita emissions refer to the average release of emissions (carbon dioxide, CO2) attributed to each individual within a certain population.
# Sort the regions based on average emissions per km^2
sorted_per_km2 = average_emissions['Emissions per km^2'].sort_values(ascending=False)
# Plotting average emissions per km^2 for each region
plt.figure(figsize=(14, 10))
sorted_per_km2.plot(kind='bar', color='lightcoral')
plt.title('Average Emissions per km^2 by Region')
plt.ylabel('Average Emissions per km^2 (kt CO2e)')
plt.xlabel('Region/Country')
plt.xticks(rotation=90)
plt.grid(axis='y')
plt.tight_layout()
plt.show()
The term "Average Emissions per km² by Region" refers to a measure used to understand the density of emissions in a given area. This metric provides an average value of emissions (carbon dioxide, CO2) distributed over the total land area of a region, expressed in square kilometers (km²)
# Filter the data for 'National Total' and group by 'Calendar Year' to get yearly national emissions
national_emissions = data_1_1_actual[data_1_1_actual['Region/Country'] == 'National Total']
national_time_series = national_emissions.groupby('Calendar Year')['Grand Total'].sum()
# Plotting the national total emissions time series
plt.figure(figsize=(12, 6))
national_time_series.plot(marker='o', linestyle='-', color='black')
plt.title('National Total Emissions Over Time (2005-2021)')
plt.xlabel('Year')
plt.ylabel('Total Emissions (kt CO2e)')
plt.grid(True)
plt.show()
# Assuming data_1_1_actual is a pandas DataFrame with the relevant data
national_emissions = data_1_1_actual[data_1_1_actual['Region/Country'] == 'National Total']
# Group by 'Calendar Year' and sum the 'Grand Total'
national_time_series = national_emissions.groupby('Calendar Year')['Grand Total'].sum()
# Convert the Series back to a DataFrame and reset the index
national_time_series_table = national_time_series.reset_index()
# Display the DataFrame as a table
national_time_series_table
| Calendar Year | Grand Total | |
|---|---|---|
| 0 | 2005 | 656867.315366 |
| 1 | 2006 | 649603.223057 |
| 2 | 2007 | 634904.263654 |
| 3 | 2008 | 614046.665833 |
| 4 | 2009 | 558455.613837 |
| 5 | 2010 | 571129.481835 |
| 6 | 2011 | 525842.800141 |
| 7 | 2012 | 544172.639988 |
| 8 | 2013 | 531229.165974 |
| 9 | 2014 | 491490.767041 |
| 10 | 2015 | 475077.919255 |
| 11 | 2016 | 449383.499987 |
| 12 | 2017 | 437824.383087 |
| 13 | 2018 | 430745.509283 |
| 14 | 2019 | 416856.663324 |
| 15 | 2020 | 376807.810496 |
| 16 | 2021 | 399046.140782 |
national_time_series_table['Grand Total'].mean()
515499.05076104676
sns.histplot(data=national_time_series_table,x='Grand Total',bins=20)
<Axes: xlabel='Grand Total', ylabel='Count'>
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
# Initialize the linear regression model
model = LinearRegression()
# Reshape the data and fit the model
X = national_time_series_table['Calendar Year'].values.reshape(-1, 1) # Independent variable
y = national_time_series_table['Grand Total'].values # Dependent variable
model.fit(X, y)
# Make predictions
national_time_series_table['Predicted Emission'] = model.predict(X)
# Coefficients
slope = model.coef_[0]
intercept = model.intercept_
# R^2 score
r_squared = model.score(X, y)
# Plotting the actual vs predicted values
plt.figure(figsize=(14, 7))
plt.scatter(X, y, color='blue', edgecolors='black', label='Actual Emission')
plt.plot(X, national_time_series_table['Predicted Emission'], color='red', linewidth=2, label='Predicted Emission')
plt.title('National Annual Emissions for All Sectors (2005-2021)')
plt.xlabel('Calendar Year')
plt.ylabel('Total Emission')
plt.grid(True)
plt.legend()
plt.show()
(slope, intercept, r_squared)
(-17644.370521085086, 36033616.90970532, 0.9742324674401649)
import numpy as np
# Predicting the emissions for the year 2024
year_2024 = np.array([[2024]])
emissions_2024 = model.predict(year_2024)
emissions_2024[0]
321410.97502910346
# Predicting the emissions for the year 2022
year_2022 = np.array([[2022]])
emissions_forecast_2022 = model.predict(year_2022)
emissions_forecast_2022[0]
356699.71607127786
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
# Exclude the year 2020 from the dataset for training
training_data = national_time_series_table[national_time_series_table['Calendar Year'] != 2020]
# Initialize the linear regression model
model = LinearRegression()
# Reshape the data and fit the model
X_train = training_data['Calendar Year'].values.reshape(-1, 1) # Independent variable (excluding 2020)
y_train = training_data['Grand Total'].values # Dependent variable (excluding 2020)
model.fit(X_train, y_train)
# Make predictions for the entire dataset including 2020
X_full = national_time_series_table['Calendar Year'].values.reshape(-1, 1)
national_time_series_table['Predicted Emission'] = model.predict(X_full)
# Predictions for 2022 to 2024
future_years = np.array([[2022], [2023], [2024]])
future_predictions = model.predict(future_years)
# Coefficients and R^2 score
slope = model.coef_[0]
intercept = model.intercept_
r_squared = model.score(X_train, y_train)
# Plotting the actual vs predicted values
plt.figure(figsize=(14, 7))
plt.scatter(X_train, y_train, color='blue', edgecolors='black', label='Actual Emission (Excluding 2020)')
plt.scatter([2020], national_time_series_table[national_time_series_table['Calendar Year'] == 2020]['Grand Total'], color='orange', edgecolors='black', label='Outlier Data (2020)')
plt.plot(X_full, national_time_series_table['Predicted Emission'], color='red', linewidth=2, label='Predicted Emission')
plt.scatter(future_years, future_predictions, color='green', edgecolors='black', label='Predicted Emission (2022-2024)')
plt.title('National Annual Emissions for All Sectors (2005-2021, excluding 2020)')
plt.xlabel('Calendar Year')
plt.ylabel('Total Emission')
plt.grid(True)
plt.legend()
plt.show()
# Extracting the predicted emissions for 2022, 2023, and 2024
predicted_2022 = future_predictions[0]
predicted_2023 = future_predictions[1]
predicted_2024 = future_predictions[2]
# Printing the predicted emissions
print(f"Predicted Emissions for 2022: {predicted_2022}")
print(f"Predicted Emissions for 2023: {predicted_2023}")
print(f"Predicted Emissions for 2024: {predicted_2024}")
Predicted Emissions for 2022: 360642.15265548974 Predicted Emissions for 2023: 343314.9896756634 Predicted Emissions for 2024: 325987.82669582963
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
# Calculating MSE, MAE, and R^2 score using the training data (excluding 2020)
mse = mean_squared_error(y_train, model.predict(X_train))
mae = mean_absolute_error(y_train, model.predict(X_train))
r2 = r2_score(y_train, model.predict(X_train))
# Printing the metrics
print(f"Mean Squared Error (MSE): {mse}")
print(f"Mean Absolute Error (MAE): {mae}")
print(f"R-squared (R²) Score: {r2}")
Mean Squared Error (MSE): 192430790.51198345 Mean Absolute Error (MAE): 11517.010197187687 R-squared (R²) Score: 0.9719953551351934
import pandas as pd
import numpy as np
from sklearn.linear_model import SGDRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
import matplotlib.pyplot as plt
# Exclude the year 2020 from the dataset for training
training_data = national_time_series_table[national_time_series_table['Calendar Year'] != 2020]
# Reshape the data
X_train = training_data['Calendar Year'].values.reshape(-1, 1) # Independent variable (excluding 2020)
y_train = training_data['Grand Total'].values # Dependent variable (excluding 2020)
# It's important to scale features when using SGD
pipeline = make_pipeline(StandardScaler(), SGDRegressor(max_iter=1000, tol=1e-3))
# Train the SGD model
sgd_model = pipeline.fit(X_train, y_train)
# Make predictions for the entire dataset including 2020
X_full = national_time_series_table['Calendar Year'].values.reshape(-1, 1)
national_time_series_table['Predicted Emission'] = sgd_model.predict(X_full)
# Predictions for 2022 to 2024
future_years = np.array([[2022], [2023], [2024]])
future_predictions = sgd_model.predict(future_years)
# Plotting the actual vs predicted values
plt.figure(figsize=(14, 7))
plt.scatter(X_train, y_train, color='blue', edgecolors='black', label='Actual Emission (Excluding 2020)')
plt.scatter([2020], national_time_series_table[national_time_series_table['Calendar Year'] == 2020]['Grand Total'], color='orange', edgecolors='black', label='Outlier Data (2020)')
plt.plot(X_full, national_time_series_table['Predicted Emission'], color='red', linewidth=2, label='Predicted Emission')
plt.scatter(future_years, future_predictions, color='green', edgecolors='black', label='Predicted Emission (2022-2024)')
plt.title('National Annual Emissions for All Sectors (2005-2021, excluding 2020)')
plt.xlabel('Calendar Year')
plt.ylabel('Total Emission')
plt.grid(True)
plt.legend()
plt.show()
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
# Calculating the metrics
mse = mean_squared_error(y_train, sgd_model.predict(X_train))
mae = mean_absolute_error(y_train, sgd_model.predict(X_train))
r2 = r2_score(y_train, sgd_model.predict(X_train))
# Printing the metrics
print(f"Mean Squared Error (MSE): {mse}")
print(f"Mean Absolute Error (MAE): {mae}")
print(f"R-squared (R²) Score: {r2}")
Mean Squared Error (MSE): 192431070.8977003 Mean Absolute Error (MAE): 11522.05633117702 R-squared (R²) Score: 0.9719953143303803
pip install statsmodels
Requirement already satisfied: statsmodels in c:\programdata\anaconda3\lib\site-packages (0.13.5) Requirement already satisfied: numpy>=1.22.3 in c:\programdata\anaconda3\lib\site-packages (from statsmodels) (1.23.5) Requirement already satisfied: packaging>=21.3 in c:\programdata\anaconda3\lib\site-packages (from statsmodels) (22.0) Requirement already satisfied: scipy>=1.3 in c:\programdata\anaconda3\lib\site-packages (from statsmodels) (1.10.0) Requirement already satisfied: patsy>=0.5.2 in c:\programdata\anaconda3\lib\site-packages (from statsmodels) (0.5.3) Requirement already satisfied: pandas>=0.25 in c:\programdata\anaconda3\lib\site-packages (from statsmodels) (1.5.3) Requirement already satisfied: pytz>=2020.1 in c:\programdata\anaconda3\lib\site-packages (from pandas>=0.25->statsmodels) (2022.7) Requirement already satisfied: python-dateutil>=2.8.1 in c:\programdata\anaconda3\lib\site-packages (from pandas>=0.25->statsmodels) (2.8.2) Requirement already satisfied: six in c:\programdata\anaconda3\lib\site-packages (from patsy>=0.5.2->statsmodels) (1.16.0) Note: you may need to restart the kernel to use updated packages.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
from statsmodels.nonparametric.kernel_regression import KernelReg
# considering using the data excluding the outlier year 2020
training_data = national_time_series_table[national_time_series_table['Calendar Year'] != 2020]
# Prepare the data
X_train = training_data['Calendar Year'].values
y_train = training_data['Grand Total'].values
# Kernel Regression
kr = KernelReg(endog=y_train, exog=X_train, var_type='c') # 'c' for continuous
X_predict = np.linspace(X_train.min(), X_train.max(), 100) # Range of years for prediction
y_kr, y_std = kr.fit(X_predict)
# Plotting the actual data and the kernel regression prediction
plt.figure(figsize=(14, 7))
plt.scatter(X_train, y_train, color='blue', label='Actual Emissions (Excluding 2020)')
plt.plot(X_predict, y_kr, color='red', label='Kernel Regression Prediction')
plt.fill_between(X_predict, y_kr - y_std, y_kr + y_std, color='red', alpha=0.2, label='Prediction Standard Error')
plt.title('National Annual Emissions for All Sectors (2005-2021, excluding 2020)')
plt.xlabel('Calendar Year')
plt.ylabel('Total Emission')
plt.legend()
plt.grid(True)
plt.show()
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) Cell In[98], line 24 22 plt.scatter(X_train, y_train, color='blue', label='Actual Emissions (Excluding 2020)') 23 plt.plot(X_predict, y_kr, color='red', label='Kernel Regression Prediction') ---> 24 plt.fill_between(X_predict, y_kr - y_std, y_kr + y_std, color='red', alpha=0.2, label='Prediction Standard Error') 25 plt.title('National Annual Emissions for All Sectors (2005-2021, excluding 2020)') 26 plt.xlabel('Calendar Year') File C:\ProgramData\anaconda3\lib\site-packages\matplotlib\pyplot.py:2571, in fill_between(x, y1, y2, where, interpolate, step, data, **kwargs) 2567 @_copy_docstring_and_deprecators(Axes.fill_between) 2568 def fill_between( 2569 x, y1, y2=0, where=None, interpolate=False, step=None, *, 2570 data=None, **kwargs): -> 2571 return gca().fill_between( 2572 x, y1, y2=y2, where=where, interpolate=interpolate, step=step, 2573 **({"data": data} if data is not None else {}), **kwargs) File C:\ProgramData\anaconda3\lib\site-packages\matplotlib\__init__.py:1442, in _preprocess_data.<locals>.inner(ax, data, *args, **kwargs) 1439 @functools.wraps(func) 1440 def inner(ax, *args, data=None, **kwargs): 1441 if data is None: -> 1442 return func(ax, *map(sanitize_sequence, args), **kwargs) 1444 bound = new_sig.bind(ax, *args, **kwargs) 1445 auto_label = (bound.arguments.get(label_namer) 1446 or bound.kwargs.get(label_namer)) File C:\ProgramData\anaconda3\lib\site-packages\matplotlib\axes\_axes.py:5431, in Axes.fill_between(self, x, y1, y2, where, interpolate, step, **kwargs) 5429 def fill_between(self, x, y1, y2=0, where=None, interpolate=False, 5430 step=None, **kwargs): -> 5431 return self._fill_between_x_or_y( 5432 "x", x, y1, y2, 5433 where=where, interpolate=interpolate, step=step, **kwargs) File C:\ProgramData\anaconda3\lib\site-packages\matplotlib\axes\_axes.py:5343, in Axes._fill_between_x_or_y(self, ind_dir, ind, dep1, dep2, where, interpolate, step, **kwargs) 5340 for name, array in [ 5341 (ind_dir, ind), (f"{dep_dir}1", dep1), (f"{dep_dir}2", dep2)]: 5342 if array.ndim > 1: -> 5343 raise ValueError(f"{name!r} is not 1-dimensional") 5345 if where is None: 5346 where = True ValueError: 'y1' is not 1-dimensional
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
from statsmodels.nonparametric.kernel_regression import KernelReg
# consider using the data excluding the outlier year 2020
training_data = national_time_series_table[national_time_series_table['Calendar Year'] != 2020]
# Prepare the data
X_train = training_data['Calendar Year'].values
y_train = training_data['Grand Total'].values
# Kernel Regression
kr = KernelReg(endog=y_train, exog=X_train, var_type='c') # 'c' for continuous
X_predict = np.linspace(X_train.min(), X_train.max(), 100) # Range of years for prediction
y_kr, y_std = kr.fit(X_predict)
# Ensure that y_std is 1-dimensional
if y_std.ndim > 1:
y_std = y_std[:,0]
# Plotting the actual data and the kernel regression prediction
plt.figure(figsize=(14, 7))
plt.scatter(X_train, y_train, color='blue', label='Actual Emissions (Excluding 2020)')
plt.plot(X_predict, y_kr, color='red', label='Kernel Regression Prediction')
plt.fill_between(X_predict, y_kr - y_std, y_kr + y_std, color='red', alpha=0.2, label='Prediction Standard Error')
plt.title('National Annual Emissions for All Sectors (2005-2021, excluding 2020)')
plt.xlabel('Calendar Year')
plt.ylabel('Total Emission')
plt.legend()
plt.grid(True)
plt.show()
from sklearn.metrics import mean_squared_error, mean_absolute_error
# Predictions on the training data
y_pred_train, _ = kr.fit(X_train)
# Calculating MSE and MAE
mse = mean_squared_error(y_train, y_pred_train)
mae = mean_absolute_error(y_train, y_pred_train)
# Printing the metrics
print(f"Mean Squared Error (MSE): {mse}")
print(f"Mean Absolute Error (MAE): {mae}")
# R-squared is not typically used with non-parametric models like kernel regression
Mean Squared Error (MSE): 98854665.32507062 Mean Absolute Error (MAE): 7560.483432559933
from sklearn.model_selection import KFold
from sklearn.metrics import mean_squared_error
import numpy as np
# Define a range of bandwidths to test
bandwidths = np.linspace(0.1, 2, 20)
# Best bandwidth and score initialization
best_bandwidth = None
best_score = float('inf')
# K-Fold for cross-validation
kf = KFold(n_splits=5)
for bandwidth in bandwidths:
mse_scores = []
for train_index, test_index in kf.split(X_train):
X_train_kf, X_test_kf = X_train[train_index], X_train[test_index]
y_train_kf, y_test_kf = y_train[train_index], y_train[test_index]
# Kernel Regression with the current bandwidth
kr = KernelReg(endog=y_train_kf, exog=X_train_kf, var_type='c', bw=[bandwidth])
y_pred_kf, _ = kr.fit(X_test_kf)
# Calculate MSE for this fold
mse = mean_squared_error(y_test_kf, y_pred_kf)
mse_scores.append(mse)
# Average MSE score for this bandwidth
avg_mse = np.mean(mse_scores)
if avg_mse < best_score:
best_score = avg_mse
best_bandwidth = bandwidth
# Best bandwidth
print(f"Best Bandwidth: {best_bandwidth}")
Best Bandwidth: 2.0
# Fitting the Kernel Regression model with the best bandwidth
kr_optimal = KernelReg(endog=y_train, exog=X_train, var_type='c', bw=[best_bandwidth])
y_pred_optimal, _ = kr_optimal.fit(X_train)
# Plotting
plt.figure(figsize=(14, 7))
plt.scatter(X_train, y_train, color='blue', label='Actual Emissions (Excluding 2020)')
plt.plot(X_train, y_pred_optimal, color='red', label='Kernel Regression Prediction (Optimal Bandwidth)')
plt.title('National Annual Emissions for All Sectors (2005-2021, excluding 2020)')
plt.xlabel('Calendar Year')
plt.ylabel('Total Emission')
plt.legend()
plt.grid(True)
plt.show()
from sklearn.metrics import mean_squared_error, mean_absolute_error
# Making predictions on the training data using the optimal bandwidth model
y_pred_optimal, _ = kr_optimal.fit(X_train)
# Calculating MSE and MAE
mse_optimal = mean_squared_error(y_train, y_pred_optimal)
mae_optimal = mean_absolute_error(y_train, y_pred_optimal)
# Printing the metrics
print(f"Mean Squared Error (MSE): {mse_optimal}")
print(f"Mean Absolute Error (MAE): {mae_optimal}")
Mean Squared Error (MSE): 122452257.76925212 Mean Absolute Error (MAE): 8541.34711829967
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
from statsmodels.nonparametric.kernel_regression import KernelReg
from sklearn.metrics import mean_squared_error, mean_absolute_error
# Creating a sample DataFrame based on the data
data = {
'Calendar Year': [2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2021],
'Grand Total': [
656867, 649603, 634904, 614046, 558455, 571129, 525842, 544172, 531229, 491490, 475077, 449383, 437824, 430745, 416856, 399046
]
}
national_time_series_table = pd.DataFrame(data)
# Exclude the year 2020 from the dataset for training
training_data = national_time_series_table
# Prepare the data
X_train = training_data['Calendar Year'].values
y_train = training_data['Grand Total'].values
# Best bandwidth (as obtained from previous grid search)
best_bandwidth = 2.0
# Kernel Regression with the optimal bandwidth
kr_optimal = KernelReg(endog=y_train, exog=X_train, var_type='c', bw=[best_bandwidth])
y_pred_optimal, _ = kr_optimal.fit(X_train)
# Calculating MSE and MAE
mse_optimal = mean_squared_error(y_train, y_pred_optimal)
mae_optimal = mean_absolute_error(y_train, y_pred_optimal)
# Predicting emissions for 2022, 2023, and 2024
future_years = np.array([2022, 2023, 2024]).reshape(-1, 1)
future_predictions, _ = kr_optimal.fit(future_years)
# Printing the metrics
print(f"Mean Squared Error (MSE): {mse_optimal}")
print(f"Mean Absolute Error (MAE): {mae_optimal}")
# Plotting the actual data, kernel regression prediction, and future predictions
plt.figure(figsize=(14, 7))
plt.scatter(X_train, y_train, color='blue', label='Actual Emissions (Excluding 2020)')
plt.plot(X_train, y_pred_optimal, color='red', label='Kernel Regression Prediction (Optimal Bandwidth)')
plt.scatter(future_years, future_predictions, color='green', marker='o', label='Predicted Emissions (2022-2024)')
plt.title('National Annual Emissions for All Sectors (2005-2021, excluding 2020)')
plt.xlabel('Calendar Year')
plt.ylabel('Total Emission')
plt.legend()
plt.grid(True)
plt.show()
# Print predicted emissions for 2022, 2023, and 2024
for year, prediction in zip(range(2022, 2025), future_predictions):
print(f"Predicted Emissions for {year}: {prediction}")
Mean Squared Error (MSE): 122454007.79715742 Mean Absolute Error (MAE): 8541.433550188427
Predicted Emissions for 2022: 388856.10498873994 Predicted Emissions for 2023: 379455.74041452305 Predicted Emissions for 2024: 370225.2050343985
# Plotting the actual data, kernel regression prediction, and future predictions
plt.figure(figsize=(14, 7))
plt.scatter(X_train, y_train, color='blue', label='Actual Emissions (Excluding 2020)')
plt.plot(X_train, y_pred_optimal, color='red', label='Kernel Regression Prediction (Optimal Bandwidth)')
plt.scatter(future_years, future_predictions, color='green', marker='o', label='Predicted Emissions (2022-2024)')
# Adjust x-axis to show each year
plt.xticks(np.arange(min(X_train), max(future_years) + 1, 1.0))
plt.title('National Annual Emissions for All Sectors (2005-2021, excluding 2020)')
plt.xlabel('Calendar Year')
plt.ylabel('Total Emission')
plt.legend()
plt.grid(True)
plt.show()
# Print predicted emissions for 2022, 2023, and 2024
for year, prediction in zip(range(2022, 2025), future_predictions):
print(f"Predicted Emissions for {year}: {prediction}")
Predicted Emissions for 2022: 388856.10498873994 Predicted Emissions for 2023: 379455.74041452305 Predicted Emissions for 2024: 370225.2050343985
from sklearn.kernel_ridge import KernelRidge
# Prepare the data (assuming the 'national_time_series_table' DataFrame is provided with correct columns)
# X = national_time_series_table['Calendar Year'].values.reshape(-1, 1) # Independent variable
# y = national_time_series_table['Grand Total'].values # Dependent variable
X = national_time_series_table['Calendar Year'].values.reshape(-1, 1) # Independent variable from the user's data
y = national_time_series_table['Total Emission'].values # Dependent variable from the user's data
# Initialize Kernel Ridge Regression model with a radial basis function (RBF) kernel, also known as Gaussian kernel
kernel_model = KernelRidge(kernel='rbf')
# Fit the model to the data
kernel_model.fit(X, y)
# Predict using the model
national_time_series_table['Predicted Emission Kernel'] = kernel_model.predict(X)
# Plot the original data and the kernel regression fit
plt.figure(figsize=(14, 7))
plt.scatter(X, y, color='black', label='Actual Emission')
plt.plot(X, df['Predicted Emission Kernel'], color='green', linestyle='--', linewidth=2, label='Kernel Regression')
plt.title('West Midlands Region Annual Emissions for All Sectors (2005-2021) - Kernel Regression Fit')
plt.xlabel('Calendar Year')
plt.ylabel('Total Emission')
plt.legend()
plt.show()
--------------------------------------------------------------------------- KeyError Traceback (most recent call last) File C:\ProgramData\anaconda3\lib\site-packages\pandas\core\indexes\base.py:3802, in Index.get_loc(self, key, method, tolerance) 3801 try: -> 3802 return self._engine.get_loc(casted_key) 3803 except KeyError as err: File C:\ProgramData\anaconda3\lib\site-packages\pandas\_libs\index.pyx:138, in pandas._libs.index.IndexEngine.get_loc() File C:\ProgramData\anaconda3\lib\site-packages\pandas\_libs\index.pyx:165, in pandas._libs.index.IndexEngine.get_loc() File pandas\_libs\hashtable_class_helper.pxi:5745, in pandas._libs.hashtable.PyObjectHashTable.get_item() File pandas\_libs\hashtable_class_helper.pxi:5753, in pandas._libs.hashtable.PyObjectHashTable.get_item() KeyError: 'Total Emission' The above exception was the direct cause of the following exception: KeyError Traceback (most recent call last) Cell In[107], line 9 3 # Prepare the data (assuming the 'national_time_series_table' DataFrame is provided with correct columns) 4 # X = national_time_series_table['Calendar Year'].values.reshape(-1, 1) # Independent variable 5 # y = national_time_series_table['Grand Total'].values # Dependent variable 8 X = national_time_series_table['Calendar Year'].values.reshape(-1, 1) # Independent variable from the user's data ----> 9 y = national_time_series_table['Total Emission'].values # Dependent variable from the user's data 11 # Initialize Kernel Ridge Regression model with a radial basis function (RBF) kernel, also known as Gaussian kernel 12 kernel_model = KernelRidge(kernel='rbf') File C:\ProgramData\anaconda3\lib\site-packages\pandas\core\frame.py:3807, in DataFrame.__getitem__(self, key) 3805 if self.columns.nlevels > 1: 3806 return self._getitem_multilevel(key) -> 3807 indexer = self.columns.get_loc(key) 3808 if is_integer(indexer): 3809 indexer = [indexer] File C:\ProgramData\anaconda3\lib\site-packages\pandas\core\indexes\base.py:3804, in Index.get_loc(self, key, method, tolerance) 3802 return self._engine.get_loc(casted_key) 3803 except KeyError as err: -> 3804 raise KeyError(key) from err 3805 except TypeError: 3806 # If we have a listlike key, _check_indexing_error will raise 3807 # InvalidIndexError. Otherwise we fall through and re-raise 3808 # the TypeError. 3809 self._check_indexing_error(key) KeyError: 'Total Emission'
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
# Generate the data set
num_points = 200
x = np.linspace(0, 1, num_points)
np.random.seed(0) # for reproducibility
epsilon = np.random.normal(0, 0.2, num_points)
y = 0.3 * np.sin(2 * np.pi * x) + epsilon
# Construct the design matrix
X = np.column_stack([x**3, x**2, x, np.ones_like(x)])
# Train the perceptron (here, linear regression)
model = LinearRegression(fit_intercept=False)
model.fit(X, y)
# Predict using the model
y_pred = model.predict(X)
# Plotting
plt.figure(figsize=(10, 6))
plt.scatter(x, y, s=10, color='blue', label='Generated Data', edgecolors='black')
plt.plot(x, y_pred, color='red', label='Best fit using perceptron')
plt.title('y = 0.3 * np.sin(2 * np.pi * x) + epsilon')
plt.xlabel('x')
plt.ylabel('y(x)')
plt.legend()
plt.grid(True)
plt.show()
# Print the weights
weights = model.coef_
print(f"w3 = {weights[0]:.2f}, w2 = {weights[1]:.2f}, w1 = {weights[2]:.2f}, w0 = {weights[3]:.2f}")
w3 = 5.22, w2 = -7.76, w1 = 2.46, w0 = 0.07
ARIMA
import pandas as pd
# Load the provided Excel file
file_path = 'book_auarterly emissions.csv'
data = pd.read_csv(file_path)
data
| NC Sector | Q1 2008 | Q2 2008 | Q3 2008 | Q4 2008 | Q1 2009 | Q2 2009 | Q3 2009 | Q4 2009 | Q1 2010 | ... | Q3 2020 | Q4 2020 | Q1 2021 | Q2 2021 | Q3 2021 | Q4 2021 | Q1 2022 | Q2 2022 | Q3 2022 | Q4 2022 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Energy supply | 58.3 | 49.7 | 46.5 | 58.4 | 58.3 | 41.7 | 39.4 | 50.5 | 55.3 | ... | 18.9 | 20.8 | 21.6 | 19.1 | 19.6 | 20.5 | 20.6 | 19.9 | 21.3 | 20.3 |
| 1 | Business | 26.8 | 21.1 | 17.5 | 23.2 | 23.2 | 16.5 | 15.7 | 20.4 | 23.5 | ... | 13.3 | 16.8 | 19.3 | 14.9 | 13.4 | 16.9 | 18.9 | 13.8 | 12.9 | 16.4 |
| 2 | Transport | 33.4 | 32.9 | 32.8 | 31.5 | 30.9 | 31.9 | 32.2 | 30.6 | 29.6 | ... | 25.6 | 26.1 | 21.9 | 27.5 | 29.5 | 29.5 | 27.0 | 29.5 | 28.2 | 27.8 |
| 3 | Public | 3.4 | 2.3 | 1.2 | 2.8 | 2.8 | 1.8 | 1.4 | 2.9 | 3.0 | ... | 1.0 | 2.1 | 2.8 | 1.3 | 1.1 | 2.3 | 2.8 | 1.3 | 1.2 | 2.3 |
| 4 | Residential | 31.0 | 13.3 | 7.3 | 26.7 | 31.0 | 11.3 | 7.2 | 25.6 | 34.8 | ... | 5.2 | 22.5 | 28.3 | 13.8 | 4.9 | 20.5 | 24.0 | 9.8 | 4.8 | 17.9 |
| 5 | Other sectors [note 8] | 4.6 | 4.4 | 4.3 | 4.3 | 3.0 | 2.9 | 2.9 | 3.0 | 3.1 | ... | 2.2 | 2.3 | 2.7 | 2.6 | 2.6 | 2.8 | 2.8 | 2.6 | 2.6 | 2.9 |
| 6 | Total CO2 | 157.5 | 123.9 | 109.6 | 146.9 | 149.3 | 106.1 | 98.7 | 133.0 | 149.2 | ... | 66.2 | 90.6 | 96.6 | 79.3 | 71.1 | 92.6 | 96.0 | 77.0 | 70.9 | 87.6 |
| 7 | Other greenhouse gases [note 6] | 29.5 | 29.5 | 29.5 | 29.5 | 28.0 | 28.0 | 28.0 | 28.0 | 27.0 | ... | 21.9 | 21.9 | 21.7 | 21.7 | 21.7 | 21.7 | 21.4 | 21.4 | 21.4 | 21.4 |
| 8 | Total greenhouse gas emissions | 187.0 | 153.4 | 139.1 | 176.4 | 177.3 | 134.1 | 126.7 | 161.0 | 176.2 | ... | 88.2 | 112.5 | 118.4 | 101.0 | 92.8 | 114.3 | 117.4 | 98.4 | 92.3 | 109.0 |
9 rows × 61 columns
from datetime import datetime
import matplotlib.dates as mdates
data_long = pd.melt(data, id_vars=['NC Sector'], var_name='Quarter', value_name='Emissions')
# Correcting the function for converting quarter strings to datetime objects
def quarter_to_datetime(quarter_str):
parts = quarter_str.split()
year = int(parts[1])
quarter = int(parts[0][1])
month = (quarter - 1) * 3 + 1
return datetime(year, month, 1)
# Applying the corrected conversion function
data_long['Quarter'] = data_long['Quarter'].apply(quarter_to_datetime)
# Plotting the data
plt.figure(figsize=(15, 8))
for sector in data_long['NC Sector'].unique():
sector_data = data_long[data_long['NC Sector'] == sector]
plt.plot(sector_data['Quarter'], sector_data['Emissions'], label=sector)
plt.title('Quarterly Emissions by Sector (2008-2022)')
plt.xlabel('Year')
plt.ylabel('Emissions')
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%Y'))
plt.gca().xaxis.set_major_locator(mdates.YearLocator())
plt.legend()
plt.grid(True)
plt.show()
# Calculating total emissions per quarter
total_emissions = data_long.groupby('Quarter').sum()
# Plotting the total emissions
plt.figure(figsize=(15, 8))
plt.plot(total_emissions.index, total_emissions['Emissions'], label='Total Emissions', color='red')
plt.title('Total Quarterly Emissions (2008-2022)')
plt.xlabel('Year')
plt.ylabel('Emissions')
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%Y'))
plt.gca().xaxis.set_major_locator(mdates.YearLocator())
plt.grid(True)
plt.legend()
plt.show()
from statsmodels.tsa.seasonal import seasonal_decompose
# Performing a seasonal decomposition of total emissions
decomposition = seasonal_decompose(total_emissions['Emissions'], model='additive', period=4)
# Extracting the trend, seasonal, and residual components
trend = decomposition.trend.dropna()
seasonal = decomposition.seasonal
residual = decomposition.resid.dropna()
# Plotting the decomposition
plt.figure(figsize=(18, 12))
# Trend
plt.subplot(411)
plt.plot(trend, label='Trend', color='blue')
plt.legend(loc='best')
plt.title('Trend in Total Quarterly Emissions')
# Seasonal
plt.subplot(412)
plt.plot(seasonal, label='Seasonality', color='green')
plt.legend(loc='best')
plt.title('Seasonal Variations in Total Quarterly Emissions')
# Residual
plt.subplot(413)
plt.plot(residual, label='Residuals', color='orange')
plt.legend(loc='best')
plt.title('Residuals of Total Quarterly Emissions')
# Original
plt.subplot(414)
plt.plot(total_emissions['Emissions'], label='Original', color='red')
plt.legend(loc='best')
plt.title('Total Quarterly Emissions')
plt.tight_layout()
plt.show()
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
# Correcting the number of lags to half of the total data points
max_lags = len(total_emissions) // 2 - 1
# Plotting ACF and PACF with the corrected number of lags
fig, axes = plt.subplots(1, 2, figsize=(15, 5))
# Plot the ACF
plot_acf(total_emissions['Emissions'], lags=max_lags, ax=axes[0])
# Plot the PACF
plot_pacf(total_emissions['Emissions'], lags=max_lags, ax=axes[1])
plt.show()
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tsa.stattools import adfuller
import numpy as np
# Testing for stationarity with the Augmented Dickey-Fuller test
adf_test = adfuller(total_emissions['Emissions'])
# If p-value is greater than 0.05, we conclude that the time series is not stationary
if adf_test[1] > 0.05:
# Taking the first difference to make the series stationary
total_emissions['Emissions_diff'] = total_emissions['Emissions'].diff().dropna()
else:
total_emissions['Emissions_diff'] = total_emissions['Emissions']
# Running the ADF test on the differenced series
adf_test_diff = adfuller(total_emissions['Emissions_diff'].dropna())
# Assuming an ARIMA model with 1 AR term (from PACF) and 1 MA term (from ACF)
# The differencing order is 1 if needed to difference, otherwise, it's 0.
d = 1 if adf_test[1] > 0.05 else 0
p = 1 # from PACF
q = 1 # from ACF
# Fit the ARIMA model
arima_model = ARIMA(total_emissions['Emissions'], order=(p, d, q))
arima_result = arima_model.fit()
# Summary of the model
arima_summary = arima_result.summary()
arima_summary
c:\ProgramData\anaconda_3\Lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: ValueWarning: No frequency information was provided, so inferred frequency QS-OCT will be used. self._init_dates(dates, freq) c:\ProgramData\anaconda_3\Lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: ValueWarning: No frequency information was provided, so inferred frequency QS-OCT will be used. self._init_dates(dates, freq) c:\ProgramData\anaconda_3\Lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: ValueWarning: No frequency information was provided, so inferred frequency QS-OCT will be used. self._init_dates(dates, freq)
| Dep. Variable: | Emissions | No. Observations: | 60 |
|---|---|---|---|
| Model: | ARIMA(1, 1, 1) | Log Likelihood | -321.549 |
| Date: | Thu, 11 Jan 2024 | AIC | 649.099 |
| Time: | 08:42:31 | BIC | 655.331 |
| Sample: | 01-01-2008 | HQIC | 651.532 |
| - 10-01-2022 | |||
| Covariance Type: | opg |
| coef | std err | z | P>|z| | [0.025 | 0.975] | |
|---|---|---|---|---|---|---|
| ar.L1 | 0.1138 | 0.170 | 0.669 | 0.503 | -0.220 | 0.447 |
| ma.L1 | -0.8330 | 0.132 | -6.296 | 0.000 | -1.092 | -0.574 |
| sigma2 | 3122.7480 | 1132.184 | 2.758 | 0.006 | 903.707 | 5341.789 |
| Ljung-Box (L1) (Q): | 0.04 | Jarque-Bera (JB): | 6.42 |
|---|---|---|---|
| Prob(Q): | 0.83 | Prob(JB): | 0.04 |
| Heteroskedasticity (H): | 0.47 | Skew: | -0.00 |
| Prob(H) (two-sided): | 0.10 | Kurtosis: | 1.38 |
Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
ARIMA(AutoRegressive Integrated Moving Average) Model Output Analysis
- Dep. Variable: Emissions: The dependent variable (the variable being predicted) is 'Emissions'.
- No. Observations: 60: There are 60 data points or observations in the time series.
- Model: ARIMA(1, 1, 1): The ARIMA model used has parameters (p=1, d=1, q=1).
- p (AR term): 1 indicates one lag is used in the autoregressive part.
- d (Differencing): 1 signifies that the data has been first-differenced once to make it stationary.
- q (MA term): 1 indicates the moving average part is based on one lagged forecast error.
- Log Likelihood: -321.549: This is the log likelihood of the model, a measure of the model's goodness of fit.
- AIC: 649.099: The Akaike Information Criterion, a measure of the model’s quality. Lower AIC values suggest a better model.
- BIC: 655.331: The Bayesian Information Criterion, another measure of model fit. Like AIC, lower is better.
- HQIC: 651.532: Hannan-Quinn Information Criterion, another criterion for model selection.
- coef (Coefficients):
- ar.L1 (0.1138): The coefficient for the AR part of the model, suggesting a mild positive relationship.
- ma.L1 (-0.8330): The coefficient for the MA part, indicating a strong negative relationship.
- P>|z|: P-values for the AR and MA coefficients. Values less than 0.05 typically suggest statistical significance. Here, the MA part is significant.
- [0.025 0.975]: The 95% confidence interval for the coefficients.
- sigma2 (3122.7480): The variance of the residuals.
- Diagnostics:
- Ljung-Box Test: A test for autocorrelation in residuals. A high p-value (here, 0.83) suggests no autocorrelation, which is good.
- Jarque-Bera Test: A test for normality of residuals. A low p-value (here, 0.04) suggests non-normality, which could be a concern.
- Heteroskedasticity Test: A test for constant variance of residuals. A low p-value (here, 0.10) suggests potential heteroskedasticity.
- Skew and Kurtosis: Measures of the shape of the residual distribution. The skewness is close to 0 (ideal), but the kurtosis is quite low, indicating a flat distribution.
Overall, this ARIMA model seems to fit the data reasonably well, especially given the significant MA coefficient and the lack of autocorrelation in the residuals. However, the potential non-normality and heteroskedasticity in the residuals, as indicated by the Jarque-Bera and Heteroskedasticity tests, might be areas to explore for improving the model.
ARIMA Model Output Analysis
Dep. Variable: Emissions: The dependent variable (the variable being predicted) is 'Emissions'.
No. Observations: 60: There are 60 data points or observations in the time series.
Model: ARIMA(1, 1, 1): The ARIMA model used has parameters (p=1, d=1, q=1).
- p (AR term): 1 indicates one lag is used in the autoregressive part.
- d (Differencing): 1 signifies that the data has been first-differenced once to make it stationary.
- q (MA term): 1 indicates the moving average part is based on one lagged forecast error.
from sklearn.model_selection import GridSearchCV
import pandas as pd
param_grid = {
'kernelridge__alpha': [0.1, 1, 10],
'kernelridge__gamma': np.logspace(-2, 2, 5)
}
# Create a pipeline with polynomial features and kernel ridge regression
kr_model = KernelRidge(kernel='rbf')
# Initialize GridSearchCV with the model and parameter grid
grid_search = GridSearchCV(make_pipeline(kr_model), param_grid, cv=5)
# Fit the grid search to the data
grid_search.fit(X, y)
# Best parameters found
best_params = grid_search.best_params_
best_score = grid_search.best_score_
# Predict using the best model
df['Best Predicted Emission Kernel'] = grid_search.predict(X)
# Plot the original data and the best kernel regression fit
plt.figure(figsize=(14, 7))
plt.scatter(X, y, color='black', label='Actual Emission')
plt.plot(X, df['Best Predicted Emission Kernel'], color='orange', linestyle='--', linewidth=2, label='Optimized Kernel Regression')
plt.title('West Midlands Region Annual Emissions for All Sectors (2005-2021) - Optimized Kernel Regression Fit')
plt.xlabel('Calendar Year')
plt.ylabel('Total Emission')
plt.legend()
plt.show()
(best_params, best_score)
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[114], line 22 19 best_score = grid_search.best_score_ 21 # Predict using the best model ---> 22 df['Best Predicted Emission Kernel'] = grid_search.predict(X) 24 # Plot the original data and the best kernel regression fit 25 plt.figure(figsize=(14, 7)) NameError: name 'df' is not defined
# Filter the data for 'National Total' and group by 'Calendar Year' to get yearly national emissions again
national_emissions = data_1_1_actual[data_1_1_actual['Region/Country'] == 'National Total']
national_time_series = national_emissions.groupby('Calendar Year')['Grand Total'].sum()
# Plotting the national total emissions time series
plt.figure(figsize=(12, 6))
national_time_series.plot(marker='o', linestyle='-')
plt.title('National Total Emissions Over Time (2005-2021)')
plt.xlabel('Year')
plt.ylabel('Total Emissions (kt CO2e)')
plt.grid(True)
plt.show()
# Reload the data from the '1_1' sheet
data_1_1 = pd.read_excel("2005-21-uk-local-authority-ghg-emissions-update-060723.xlsx", sheet_name='1_1')
# Extract the actual data, excluding the header information
data_1_1_actual = data_1_1.iloc[4:]
# Set the column names from the header row
data_1_1_actual.columns = data_1_1.iloc[3]
# Reset the index for the actual data
data_1_1_actual = data_1_1_actual.reset_index(drop=True)
# Convert the 'Grand Total' column to a numeric type
data_1_1_actual['Grand Total'] = pd.to_numeric(data_1_1_actual['Grand Total'], errors='coerce')
# Filter the data for 'National Total' and group by 'Calendar Year' to get yearly national emissions
national_emissions = data_1_1_actual[data_1_1_actual['Region/Country'] == 'National Total']
national_time_series = national_emissions.groupby('Calendar Year')['Grand Total'].sum()
# Plotting the national total emissions time series
plt.figure(figsize=(12, 6))
national_time_series.plot(marker='o', linestyle='-')
plt.title('National Total Emissions Over Time (2005-2021)')
plt.xlabel('Year')
plt.ylabel('Total Emissions (kt CO2e)')
plt.grid(True)
plt.show()
import pandas as pd
import matplotlib.pyplot as plt
# Reload the data from the '1_1' sheet
data_1_1 = pd.read_excel("2005-21-uk-local-authority-ghg-emissions-update-060723.xlsx", sheet_name='1_1')
# Extract the actual data, excluding the header information
data_1_1_actual = data_1_1.iloc[4:]
# Set the column names from the header row
data_1_1_actual.columns = data_1_1.iloc[3]
# Reset the index for the actual data
data_1_1_actual = data_1_1_actual.reset_index(drop=True)
# Convert the 'Grand Total' column to a numeric type
data_1_1_actual['Grand Total'] = pd.to_numeric(data_1_1_actual['Grand Total'], errors='coerce')
# Filter the data for 'National Total' and group by 'Calendar Year' to get yearly national emissions
national_emissions = data_1_1_actual[data_1_1_actual['Region/Country'] == 'National Total']
national_time_series = national_emissions.groupby('Calendar Year')['Grand Total'].sum()
# Plotting the national total emissions time series
plt.figure(figsize=(12, 6))
national_time_series.plot(marker='o', linestyle='-')
plt.title('National Total Emissions Over Time (2005-2021)')
plt.xlabel('Year')
plt.ylabel('Total Emissions (kt CO2e)')
plt.grid(True)
plt.show()
# Reload the data from the '1_1' sheet
data_1_1 = pd.read_excel("2005-21-uk-local-authority-ghg-emissions-update-060723.xlsx", sheet_name='1_1')
# Extract the actual data, excluding the header information
data_1_1_actual = data_1_1.iloc[4:]
# Set the column names from the header row
data_1_1_actual.columns = data_1_1.iloc[3]
# Reset the index for the actual data
data_1_1_actual = data_1_1_actual.reset_index(drop=True)
# Convert the 'Grand Total' column to a numeric type
data_1_1_actual['Grand Total'] = pd.to_numeric(data_1_1_actual['Grand Total'], errors='coerce')
# Filter the data for 'National Total' and group by 'Calendar Year' to get yearly national emissions
national_emissions = data_1_1_actual[data_1_1_actual['Region/Country'] == 'National Total']
national_time_series = national_emissions.groupby('Calendar Year')['Grand Total'].sum()
# Plotting the national total emissions time series
plt.figure(figsize=(12, 6))
national_time_series.plot(marker='o', linestyle='-')
plt.title('National Total Emissions Over Time (2005-2021)')
plt.xlabel('Year')
plt.ylabel('Total Emissions (kt CO2e)')
plt.grid(True)
plt.show()
import pandas as pd
import numpy as np
from statsmodels.tsa.arima.model import ARIMA
import warnings
warnings.filterwarnings("ignore")
# Reload the data from the '1_1' sheet
data_1_1 = pd.read_excel("2005-21-uk-local-authority-ghg-emissions-update-060723.xlsx", sheet_name='1_1')
# Extract the actual data, excluding the header information
data_1_1_actual = data_1_1.iloc[4:]
# Set the column names from the header row
data_1_1_actual.columns = data_1_1.iloc[3]
# Reset the index for the actual data
data_1_1_actual = data_1_1_actual.reset_index(drop=True)
# Convert the 'Grand Total' column to a numeric type
data_1_1_actual['Grand Total'] = pd.to_numeric(data_1_1_actual['Grand Total'], errors='coerce')
# Filter the data for 'National Total' and group by 'Calendar Year' to get yearly national emissions
national_emissions = data_1_1_actual[data_1_1_actual['Region/Country'] == 'National Total']
national_time_series = national_emissions.groupby('Calendar Year')['Grand Total'].sum()
# Function to find the best ARIMA parameters with the lowest AIC value
def best_arima_model(data):
best_aic = float('inf')
best_order = None
for p in range(5):
for d in range(2): # Limiting d to 1 as higher orders may not be needed
for q in range(5):
try:
model = ARIMA(data, order=(p, d, q))
model_fit = model.fit()
if model_fit.aic < best_aic:
best_aic = model_fit.aic
best_order = (p, d, q)
except:
continue
return best_order
# Finding the best ARIMA parameters for 'Grand Total' time series
best_p, best_d, best_q = best_arima_model(national_time_series)
# Print the best parameters
print(f"Best ARIMA Parameters (p, d, q) for 'Grand Total': ({best_p}, {best_d}, {best_q})")
Best ARIMA Parameters (p, d, q) for 'Grand Total': (1, 1, 1)
from statsmodels.tsa.seasonal import seasonal_decompose
# Decompose the time series into trend, seasonal, and residual components
decomposition = seasonal_decompose(national_time_series, model='additive')
# Plotting the decomposition components
plt.figure(figsize=(14, 12))
plt.subplot(4, 1, 1)
plt.plot(national_time_series, label='Original')
plt.legend(loc='upper left')
plt.title('National Total Emissions Over Time')
plt.grid(True)
plt.subplot(4, 1, 2)
plt.plot(decomposition.trend, label='Trend')
plt.legend(loc='upper left')
plt.title('Trend Component')
plt.grid(True)
plt.subplot(4, 1, 3)
plt.plot(decomposition.seasonal, label='Seasonal')
plt.legend(loc='upper left')
plt.title('Seasonal Component')
plt.grid(True)
plt.subplot(4, 1, 4)
plt.plot(decomposition.resid, label='Residual')
plt.legend(loc='upper left')
plt.title('Residual Component')
plt.grid(True)
plt.tight_layout()
plt.show()
# Decompose the time series with a specified period of 1 (yearly data)
decomposition = seasonal_decompose(national_time_series, model='additive', period=1)
# Plotting the decomposition components
plt.figure(figsize=(14, 12))
plt.subplot(4, 1, 1)
plt.plot(national_time_series, label='Original')
plt.legend(loc='upper left')
plt.title('National Total Emissions Over Time')
plt.grid(True)
plt.subplot(4, 1, 2)
plt.plot(decomposition.trend, label='Trend')
plt.legend(loc='upper left')
plt.title('Trend Component')
plt.grid(True)
plt.subplot(4, 1, 3)
plt.plot(decomposition.seasonal, label='Seasonal')
plt.legend(loc='upper left')
plt.title('Seasonal Component')
plt.grid(True)
plt.subplot(4, 1, 4)
plt.plot(decomposition.resid, label='Residual')
plt.legend(loc='upper left')
plt.title('Residual Component')
plt.grid(True)
plt.tight_layout()
plt.show()
from statsmodels.tsa.stattools import adfuller
# Conduct Augmented Dickey-Fuller test
adf_result = adfuller(national_time_series)
# Extract the p-value
p_value = adf_result[1]
p_value
0.9171724950612403
# Apply first-order differencing to the time series
national_time_series_diff = national_time_series.diff().dropna()
# Conduct Augmented Dickey-Fuller test on the differenced series
adf_result_diff = adfuller(national_time_series_diff)
# Extract the p-value for the differenced series
p_value_diff = adf_result_diff[1]
p_value_diff
0.22240697481609656
from statsmodels.tsa.arima.model import ARIMA
from sklearn.metrics import mean_squared_error
from math import sqrt
import warnings
warnings.filterwarnings("ignore")
# Splitting the data into train and test sets
train_size = int(len(time_series_data) * 0.75)
train, test = time_series_data[0:train_size], time_series_data[train_size:]
# Function to find the best ARIMA parameters with the lowest AIC value
def best_arima_model(data):
best_aic = float('inf')
best_order = None
for p in range(5):
for d in range(2): # limiting d to 1 as our data doesn't seem to require higher order differencing
for q in range(5):
try:
model = ARIMA(data, order=(p,d,q))
model_fit = model.fit()
if model_fit.aic < best_aic:
best_aic = model_fit.aic
best_order = (p,d,q)
except:
continue
return best_order
# Finding the best ARIMA parameters
best_p, best_d, best_q = best_arima_model(train)
# Using the best parameters to fit the ARIMA model
model = ARIMA(train, order=(best_p, best_d, best_q))
model_fit = model.fit()
forecast = model_fit.forecast(steps=len(test))
best_p, best_d, best_q
# Adjusting the number of lags for the short time series and plotting ACF and PACF again
plt.figure(figsize=(14, 6))
plt.subplot(1, 2, 1)
plot_acf(national_time_series_diff, lags=7, ax=plt.gca())
plt.title('Autocorrelation Function (ACF)')
plt.subplot(1, 2, 2)
plot_pacf(national_time_series_diff, lags=7, ax=plt.gca())
plt.title('Partial Autocorrelation Function (PACF)')
plt.tight_layout()
plt.show()
from statsmodels.tsa.arima.model import ARIMA
from sklearn.metrics import mean_squared_error
# Split the data into train and test sets (using 80% of the data for training)
train_size = int(len(national_time_series) * 0.8)
train, test = national_time_series[0:train_size], national_time_series[train_size:]
# Fit the ARIMA(1,2,2) model
model = ARIMA(train, order=(1,2,2))
model_fit = model.fit(disp=0)
# Forecast emissions for the test set
forecast = model_fit.forecast(steps=len(test))[0]
# Calculate the mean squared error
mse = mean_squared_error(test, forecast)
mse
# Fit the ARIMA(1,2,2) model without the 'disp' argument
model_fit = model.fit()
# Forecast emissions for the test set
forecast = model_fit.forecast(steps=len(test))[0]
# Calculate the mean squared error
mse = mean_squared_error(test, forecast)
mse
# Get the forecasted values as a numpy array
forecast = model_fit.forecast(steps=len(test))
# Calculate the mean squared error
mse = mean_squared_error(test.values, forecast)
mse
# Plotting the actual vs. forecasted emissions
plt.figure(figsize=(12, 6))
plt.plot(train.index, train.values, label='Training Data', color='blue', marker='o')
plt.plot(test.index, test.values, label='Actual Test Data', color='green', marker='o')
plt.plot(test.index, forecast, label='Forecasted Data', color='red', linestyle='--')
plt.title('National Total Emissions: Actual vs. Forecasted')
plt.xlabel('Year')
plt.ylabel('Total Emissions (kt CO2e)')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()
from statsmodels.tsa.stattools import adfuller
# Conduct Augmented Dickey-Fuller test
adf_result = adfuller(national_time_series)
# Extract the p-value
p_value = adf_result[1]
p_value
# Apply first-order differencing to the time series
national_time_series_diff = national_time_series.diff().dropna()
# Conduct Augmented Dickey-Fuller test on the differenced series
adf_result_diff = adfuller(national_time_series_diff)
# Extract the p-value for the differenced series
p_value_diff = adf_result_diff[1]
p_value_diff
The p-value for the differenced series is approximately 0.222 0.222. Although this is lower than the original p-value, it's still greater than the significance level of 0.05 0.05. Therefore, the differenced series is also considered non-stationary based on this test.
This might necessitate the use of more advanced differencing techniques, transformations, or models that can handle non-stationary data (like ARIMA with integration order greater than 1).
# Adjusting the number of lags for the short time series and plotting ACF and PACF again
plt.figure(figsize=(14, 6))
plt.subplot(1, 2, 1)
plot_acf(national_time_series_diff, lags=7, ax=plt.gca())
plt.title('Autocorrelation Function (ACF)')
plt.subplot(1, 2, 2)
plot_pacf(national_time_series_diff, lags=7, ax=plt.gca())
plt.title('Partial Autocorrelation Function (PACF)')
plt.tight_layout()
plt.show()
from statsmodels.tsa.arima.model import ARIMA
from sklearn.metrics import mean_squared_error
# Split the data into train and test sets (using 80% of the data for training)
train_size = int(len(national_time_series) * 0.8)
train, test = national_time_series[0:train_size], national_time_series[train_size:]
# Fit the ARIMA(1,2,2) model
model = ARIMA(train, order=(1,2,2))
model_fit = model.fit(disp=0)
# Forecast emissions for the test set
forecast = model_fit.forecast(steps=len(test))[0]
# Calculate the mean squared error
mse = mean_squared_error(test, forecast)
mse
forecast = model_fit.forecast(steps=len(test))
mse = mean_squared_error(test.values, forecast)
mse
The mean squared error (MSE) for the ARIMA(1,2,2) model on the test set is approximately 398,563,329.78 398,563,329.78 kt CO2CO2 e^2.
plt.figure(figsize=(12, 6))
plt.plot(train.index, train.values, label='Training Data', color='blue', marker='o')
plt.plot(test.index, test.values, label='Actual Test Data', color='green', marker='o')
plt.plot(test.index, forecast, label='Forecasted Data', color='red', linestyle='--')
plt.title('National Total Emissions: Actual vs. Forecasted')
plt.xlabel('Year')
plt.ylabel('Total Emissions (kt CO2e)')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()
The plot illustrates the actual vs. forecasted national total emissions:
Blue Line: Training data used to fit the ARIMA model. Green Line: Actual emissions data from the test set. Red Dashed Line: Forecasted emissions using the ARIMA(1,2,2) model.
While the forecast captures the general downward trend in emissions, there are discrepancies between the actual and forecasted values, which are reflected in the MSE we computed earlier.
From this visualization, one can assess the potential utility of the model and consider making adjustments or exploring other models for improved forecasting accuracy.
from sklearn.linear_model import LinearRegression
import numpy as np
# Prepare the data
X = np.array(national_time_series.index).reshape(-1, 1) # Predictor variable (Year)
y = national_time_series.values # Response variable (Emissions)
# Initialize the linear regression model
lr_model = LinearRegression()
# Fit the model to the data
lr_model.fit(X, y)
# Predict emissions using the linear regression model
y_pred = lr_model.predict(X)
# Plotting the actual data and the linear regression predictions
plt.figure(figsize=(12, 6))
plt.plot(X, y, label='Actual Emissions', color='blue', marker='o')
plt.plot(X, y_pred, label='Linear Regression Predictions', color='red', linestyle='--')
plt.title('National Total Emissions: Actual vs. Linear Regression Predictions')
plt.xlabel('Year')
plt.ylabel('Total Emissions (kt CO2e)')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
# Compute the Mean Squared Error (MSE)
mse = mean_squared_error(y, y_pred)
# Compute the Mean Absolute Error (MAE)
mae = mean_absolute_error(y, y_pred)
# Compute the R-squared (R2) score
r2 = r2_score(y, y_pred)
mse, mae, r2
Mean Squared Error (MSE): ( 197,621,328.55 ) ( \text{kt CO}_2 \text{e}^2 ) Mean Absolute Error (MAE): ( 11,708.53 ) ( \text{kt CO}_2 \text{e} ) R-squared (( R^2 )): ( 0.9742 )
The ( R^2 ) value is relatively high, suggesting that the model explains approximately 97.42% of the variance in the emissions data. However, the absolute errors (both MSE and MAE) are also sizable, which means there's room for improvement in the model's predictions.
# Introduce a quadratic term
X_quadratic = np.column_stack((X, X**2))
# Initialize and fit the quadratic regression model
lr_quadratic = LinearRegression()
lr_quadratic.fit(X_quadratic, y)
# Predict emissions using the quadratic regression model
y_pred_quadratic = lr_quadratic.predict(X_quadratic)
# Compute performance metrics for the quadratic model
mse_quadratic = mean_squared_error(y, y_pred_quadratic)
mae_quadratic = mean_absolute_error(y, y_pred_quadratic)
r2_quadratic = r2_score(y, y_pred_quadratic)
mse_quadratic, mae_quadratic, r2_quadratic
# Plotting the actual data, linear regression predictions, and quadratic regression predictions
plt.figure(figsize=(12, 6))
plt.plot(X, y, label='Actual Emissions', color='blue', marker='o')
plt.plot(X, y_pred, label='Linear Regression Predictions', color='green', linestyle='--')
plt.plot(X, y_pred_quadratic, label='Quadratic Regression Predictions', color='red', linestyle='--')
plt.title('National Total Emissions: Actual vs. Regression Predictions')
plt.xlabel('Year')
plt.ylabel('Total Emissions (kt CO2e)')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()
from sklearn.preprocessing import PolynomialFeatures
# Generate polynomial features (up to degree 3)
poly = PolynomialFeatures(degree=3)
X_poly = poly.fit_transform(X)
# Initialize and fit the polynomial regression model
lr_poly = LinearRegression()
lr_poly.fit(X_poly, y)
# Predict emissions using the polynomial regression model
y_pred_poly = lr_poly.predict(X_poly)
# Plotting the actual data and the polynomial regression predictions
plt.figure(figsize=(12, 6))
plt.plot(X, y, label='Actual Emissions', color='blue', marker='o')
plt.plot(X, y_pred_poly, label='Polynomial Regression Predictions (Degree 3)', color='purple', linestyle='--')
plt.title('National Total Emissions: Actual vs. Polynomial Regression Predictions')
plt.xlabel('Year')
plt.ylabel('Total Emissions (kt CO2e)')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()
from sklearn.linear_model import Ridge
from sklearn.model_selection import GridSearchCV
# Define the Ridge regression model
ridge = Ridge()
# Define hyperparameter space
param_grid = {
'alpha': [1e-3, 1e-2, 1e-1, 1, 10, 100, 1000] # Regularization strength values
}
# Initialize grid search
grid_search = GridSearchCV(ridge, param_grid, scoring='neg_mean_squared_error', cv=5)
# Perform grid search on the polynomial features
grid_search.fit(X_poly, y)
# Extract best hyperparameters and the corresponding score
best_params = grid_search.best_params_
best_score = grid_search.best_score_
best_params, best_score
# Using the best Ridge regression model to make predictions
best_ridge = grid_search.best_estimator_
y_pred_ridge = best_ridge.predict(X_poly)
# Plotting the actual data, polynomial regression predictions, and Ridge regression predictions
plt.figure(figsize=(12, 6))
plt.plot(X, y, label='Actual Emissions', color='blue', marker='o')
plt.plot(X, y_pred_poly, label='Polynomial Regression Predictions (Degree 3)', color='purple', linestyle='--')
plt.plot(X, y_pred_ridge, label='Ridge Regression Predictions (Degree 3, Alpha=1000)', color='orange', linestyle='--')
plt.title('National Total Emissions: Actual vs. Regression Predictions')
plt.xlabel('Year')
plt.ylabel('Total Emissions (kt CO2e)')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()
from sklearn.linear_model import Ridge
from sklearn.model_selection import GridSearchCV
# Define the Ridge regression model
ridge = Ridge()
# Define hyperparameter space
param_grid = {
'alpha': [1e-3, 1e-2, 1e-1, 1, 10, 100, 1000] # Regularization strength values
}
# Initialize grid search
grid_search = GridSearchCV(ridge, param_grid, scoring='neg_mean_squared_error', cv=5)
# Perform grid search on the polynomial features
grid_search.fit(X_poly, y)
# Extract best hyperparameters and the corresponding score
best_params = grid_search.best_params_
best_score = grid_search.best_score_
best_params, best_score
# Using the best Ridge regression model to make predictions
best_ridge = grid_search.best_estimator_
y_pred_ridge = best_ridge.predict(X_poly)
# Plotting the actual data, polynomial regression predictions, and Ridge regression predictions
plt.figure(figsize=(12, 6))
plt.plot(X, y, label='Actual Emissions', color='blue', marker='o')
plt.plot(X, y_pred_poly, label='Polynomial Regression Predictions (Degree 3)', color='purple', linestyle='--')
plt.plot(X, y_pred_ridge, label='Ridge Regression Predictions (Degree 3, Alpha=1000)', color='orange', linestyle='--')
plt.title('National Total Emissions: Actual vs. Regression Predictions')
plt.xlabel('Year')
plt.ylabel('Total Emissions (kt CO2e)')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()
import tensorflow as tf
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
# Reshape the data
X = np.array(national_time_series.index).reshape(-1, 1) # Predictor variable (Year)
# Split the data into training and test sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, shuffle=True)
# Scale the data (important for neural network convergence)
scaler_X = StandardScaler().fit(X_train)
scaler_y = StandardScaler().fit(y_train.reshape(-1, 1))
X_train_scaled = scaler_X.transform(X_train)
X_test_scaled = scaler_X.transform(X_test)
y_train_scaled = scaler_y.transform(y_train.reshape(-1, 1))
y_test_scaled = scaler_y.transform(y_test.reshape(-1, 1))
X_train_scaled.shape, X_test_scaled.shape, y_train_scaled.shape, y_test_scaled.shape
# Reload the data from the '1_1' sheet
data_1_1 = pd.read_excel("2005-21-uk-local-authority-ghg-emissions-update-060723.xlsx", sheet_name='1_1')
# Extract the actual data, excluding the header information
data_1_1_actual = data_1_1.iloc[4:]
# Set the column names from the header row
data_1_1_actual.columns = data_1_1.iloc[3]
# Reset the index for the actual data
data_1_1_actual = data_1_1_actual.reset_index(drop=True)
# Convert the 'Grand Total' column to a numeric type
data_1_1_actual['Grand Total'] = pd.to_numeric(data_1_1_actual['Grand Total'], errors='coerce')
# Filter the data for 'National Total' and group by 'Calendar Year' to get yearly national emissions
national_emissions = data_1_1_actual[data_1_1_actual['Region/Country'] == 'National Total']
national_time_series = national_emissions.groupby('Calendar Year')['Grand Total'].sum()
# Prepare the data again
X = np.array(national_time_series.index).reshape(-1, 1) # Predictor variable (Year)
# Split the data into training and test sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, shuffle=True)
# Scale the data
scaler_X = StandardScaler().fit(X_train)
scaler_y = StandardScaler().fit(y_train.reshape(-1, 1))
X_train_scaled = scaler_X.transform(X_train)
X_test_scaled = scaler_X.transform(X_test)
y_train_scaled = scaler_y.transform(y_train.reshape(-1, 1))
y_test_scaled = scaler_y.transform(y_test.reshape(-1, 1))
X_train_scaled.shape, X_test_scaled.shape, y_train_scaled.shape, y_test_scaled.shape
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
# Reload the data from the '1_1' sheet
data_1_1 = pd.read_excel("2005-21-uk-local-authority-ghg-emissions-update-060723.xlsx", sheet_name='1_1')
# Extract the actual data, excluding the header information
data_1_1_actual = data_1_1.iloc[4:]
# Set the column names from the header row
data_1_1_actual.columns = data_1_1.iloc[3]
# Reset the index for the actual data
data_1_1_actual = data_1_1_actual.reset_index(drop=True)
# Convert the 'Grand Total' column to a numeric type
data_1_1_actual['Grand Total'] = pd.to_numeric(data_1_1_actual['Grand Total'], errors='coerce')
# Filter the data for 'National Total' and group by 'Calendar Year' to get yearly national emissions
national_emissions = data_1_1_actual[data_1_1_actual['Region/Country'] == 'National Total']
national_time_series = national_emissions.groupby('Calendar Year')['Grand Total'].sum()
# Prepare the data again
X = np.array(national_time_series.index).reshape(-1, 1) # Predictor variable (Year)
# Split the data into training and test sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, shuffle=True)
# Scale the data
scaler_X = StandardScaler().fit(X_train)
scaler_y = StandardScaler().fit(y_train.reshape(-1, 1))
X_train_scaled = scaler_X.transform(X_train)
X_test_scaled = scaler_X.transform(X_test)
y_train_scaled = scaler_y.transform(y_train.reshape(-1, 1))
y_test_scaled = scaler_y.transform(y_test.reshape(-1, 1))
X_train_scaled.shape, X_test_scaled.shape, y_train_scaled.shape, y_test_scaled.shape
# Define the target variable (Emissions)
y = national_time_series.values
# Split the data into training and test sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, shuffle=True)
# Scale the data
scaler_X = StandardScaler().fit(X_train)
scaler_y = StandardScaler().fit(y_train.reshape(-1, 1))
X_train_scaled = scaler_X.transform(X_train)
X_test_scaled = scaler_X.transform(X_test)
y_train_scaled = scaler_y.transform(y_train.reshape(-1, 1))
y_test_scaled = scaler_y.transform(y_test.reshape(-1, 1))
X_train_scaled.shape, X_test_scaled.shape, y_train_scaled.shape, y_test_scaled.shape
# TensorFlow and Keras imports
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
# Define the neural network model with dropout
model = Sequential([
Dense(64, activation='relu', input_shape=(X_train_scaled.shape[1],)),
Dropout(0.5),
Dense(32, activation='relu'),
Dropout(0.5),
Dense(1, activation='linear')
])
# Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')
# Train the model
history = model.fit(X_train_scaled, y_train_scaled, epochs=500, validation_data=(X_test_scaled, y_test_scaled), verbose=0)
# Plotting the training and validation loss over epochs
plt.figure(figsize=(10, 5))
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('Training and Validation Loss')
plt.xlabel('Epoch')
plt.ylabel('Mean Squared Error')
plt.legend()
plt.grid(True)
plt.show()
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
model = Sequential([
Dense(64, activation='relu', input_shape=(X_train_scaled.shape[1],)),
Dropout(0.5),
Dense(32, activation='relu'),
Dropout(0.5),
Dense(1, activation='linear')
])
model.compile(optimizer='adam', loss='mean_squared_error')
history = model.fit(X_train_scaled, y_train_scaled, epochs=500, validation_data=(X_test_scaled, y_test_scaled))
import matplotlib.pyplot as plt
# Predict emissions for training and validation sets
y_train_pred_scaled = model.predict(X_train_scaled)
y_test_pred_scaled = model.predict(X_test_scaled)
# Transform predictions back to original domain
y_train_pred = scaler_y.inverse_transform(y_train_pred_scaled)
y_test_pred = scaler_y.inverse_transform(y_test_pred_scaled)
# Plot actual vs. predicted emissions for the training set
plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
plt.scatter(X_train, y_train, color='blue', label='Actual')
plt.scatter(X_train, y_train_pred, color='red', label='Predicted', marker='x')
plt.title('Training Set: Actual vs. Predicted Emissions')
plt.xlabel('Year')
plt.ylabel('Emissions')
plt.legend()
# Plot actual vs. predicted emissions for the validation set
plt.subplot(1, 2, 2)
plt.scatter(X_test, y_test, color='blue', label='Actual')
plt.scatter(X_test, y_test_pred, color='red', label='Predicted', marker='x')
plt.title('Validation Set: Actual vs. Predicted Emissions')
plt.xlabel('Year')
plt.ylabel('Emissions')
plt.legend()
plt.tight_layout()
plt.show()
from sklearn.metrics import mean_absolute_error, mean_squared_error
from sklearn.metrics import r2_score
# Compute the Mean Absolute Error (MAE)
mae = mean_absolute_error(y_test, y_test_pred)
# Compute the Mean Squared Error (MSE)
mse = mean_squared_error(y_test, y_test_pred)
# Compute the R^2 score
r2 = r2_score(y_test, y_test_pred)
print(f"Mean Absolute Error (MAE) on Test Set: {mae:.2f}")
print(f"Mean Squared Error (MSE) on Test Set: {mse:.2f}")
print(f"R^2 Score on Test Set: {r2:.2f}")
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.initializers import HeNormal
# Initialize the model
model = Sequential()
# Input layer
model.add(Dense(64, activation='relu', kernel_initializer=HeNormal(), input_shape=(X_train_scaled.shape[1],)))
model.add(Dropout(0.3))
# Hidden layers
model.add(Dense(32, activation='relu', kernel_initializer=HeNormal()))
model.add(Dropout(0.3))
model.add(Dense(16, activation='relu', kernel_initializer=HeNormal()))
model.add(Dropout(0.3))
# Output layer
model.add(Dense(1, activation='linear'))
# Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')
# Train the model (assuming X_train_scaled, y_train_scaled are your training data)
history = model.fit(X_train_scaled, y_train_scaled, epochs=500, batch_size=32, validation_data=(X_test_scaled, y_test_scaled))
import matplotlib.pyplot as plt
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
# Predict emissions for the validation set
y_val_pred = model.predict(X_val_scaled)
# Visual Inspection
plt.figure(figsize=(10, 5))
plt.scatter(X_val, y_val, color='blue', label='Actual')
plt.scatter(X_val, y_val_pred, color='red', s=10, label='Predicted')
plt.title('Validation Set: Actual vs. Predicted Emissions')
plt.xlabel('Year')
plt.ylabel('Emissions')
plt.legend()
plt.grid(True)
plt.show()
# Quantitative Evaluation
mae = mean_absolute_error(y_val, y_val_pred)
mse = mean_squared_error(y_val, y_val_pred)
r2 = r2_score(y_val, y_val_pred)
print(f"Mean Absolute Error (MAE) on Validation Set: {mae:.2f}")
print(f"Mean Squared Error (MSE) on Validation Set: {mse:.2f}")
print(f"R^2 Score on Validation Set: {r2:.2f}")
import matplotlib.pyplot as plt
# Assuming df is your dataframe and 'feature_name' is the name of a feature
plt.boxplot(data_1_1_actual['Grand Total'])
plt.title('Boxplot of feature_name')
plt.show()
Q1 = data_1_1_actual['Grand Total'].quantile(0.25)
Q3 = data_1_1_actual['Grand Total'].quantile(0.75)
IQR = Q3 - Q1
outliers = data_1_1_actual[(data_1_1_actual['Grand Total'] < (Q1 - 1.5 * IQR)) | (data_1_1_actual['Grand Total'] > (Q3 + 1.5 * IQR))]
# Loading the data from sheet "5_1"
data_5_1 = pd.read_excel("2005-21-uk-local-authority-ghg-emissions-update-060723.xlsx", sheet_name="5_1")
# Displaying the first few rows of the data
data_5_1.head()
# Skipping the initial rows to get to the actual data
data_5_1_cleaned = pd.read_excel("2005-21-uk-local-authority-ghg-emissions-update-060723.xlsx",
sheet_name="5_1", skiprows=4)
# Displaying the first few rows of the cleaned data
data_5_1_cleaned.head()
# Dropping the first row (header info)
data_5_1_cleaned = data_5_1_cleaned.drop(0)
# Calculating the total emissions for each year
total_emissions_per_year = data_5_1_cleaned.iloc[:, 6:].sum()
total_emissions_per_year
# Visualizing the total emissions over the years
plt.figure(figsize=(14, 7))
total_emissions_per_year.plot(marker='o', linestyle='-')
plt.title('Total Emissions Over Time (2005-2021)')
plt.xlabel('Year')
plt.ylabel('Emissions (kt CO2e)')
plt.xticks(rotation=45)
plt.grid(True)
plt.tight_layout()
plt.show()
# Grouping by regions and summing up the emissions for the entire period (2005-2021)
total_emissions_by_region = data_5_1_cleaned.groupby('Area')[data_5_1_cleaned.columns[6:]].sum().sum(axis=1)
# Identifying regions with the highest and lowest emissions
highest_emission_region = total_emissions_by_region.idxmax()
lowest_emission_region = total_emissions_by_region.idxmin()
highest_emission_value = total_emissions_by_region.max()
lowest_emission_value = total_emissions_by_region.min()
highest_emission_region, highest_emission_value, lowest_emission_region, lowest_emission_value
# Displaying the column names to identify the correct column for regions
data_5_1_cleaned.columns
# Displaying unique values in the first few columns to identify the regions
unique_values_in_columns = {}
for col in data_5_1_cleaned.columns[:5]:
unique_values_in_columns[col] = data_5_1_cleaned[col].unique()
unique_values_in_columns
# Grouping by regions and summing up the emissions for the entire period (2005-2021)
total_emissions_by_region = data_5_1_cleaned.groupby("Freeze panes are active on this sheet. To turn off freeze panes select the 'View' ribbon then 'Freeze Panes' then 'Unfreeze Panes' or use [Alt W, F].")[data_5_1_cleaned.columns[6:]].sum().sum(axis=1)
# Identifying regions with the highest and lowest emissions
highest_emission_region = total_emissions_by_region.idxmax()
lowest_emission_region = total_emissions_by_region.idxmin()
highest_emission_value = total_emissions_by_region.max()
lowest_emission_value = total_emissions_by_region.min()
highest_emission_region, highest_emission_value, lowest_emission_region, lowest_emission_value
# Grouping by companies/entities and summing up the emissions for the entire period (2005-2021)
total_emissions_by_entity = data_5_1_cleaned.groupby("Unnamed: 1")[data_5_1_cleaned.columns[6:]].sum().sum(axis=1)
# Identifying entities/companies with the highest emissions
top_5_entities = total_emissions_by_entity.nlargest(5)
top_5_entities
# Preparing the data for time series analysis
# We'll consider the total emissions for each year as our time series data
time_series_data = total_emissions_per_year
# Plotting the data for visual inspection
plt.figure(figsize=(14, 7))
time_series_data.plot(marker='o', linestyle='-')
plt.title('Total Emissions Over Time (2005-2021)')
plt.xlabel('Year')
plt.ylabel('Emissions (kt CO2e)')
plt.xticks(rotation=45)
plt.grid(True)
plt.tight_layout()
plt.show()
from statsmodels.tsa.arima.model import ARIMA
from sklearn.metrics import mean_squared_error
from math import sqrt
import warnings
warnings.filterwarnings("ignore")
# Splitting the data into train and test sets
train_size = int(len(time_series_data) * 0.75)
train, test = time_series_data[0:train_size], time_series_data[train_size:]
# Function to find the best ARIMA parameters with the lowest AIC value
def best_arima_model(data):
best_aic = float('inf')
best_order = None
for p in range(5):
for d in range(2): # limiting d to 1 as our data doesn't seem to require higher order differencing
for q in range(5):
try:
model = ARIMA(data, order=(p,d,q))
model_fit = model.fit()
if model_fit.aic < best_aic:
best_aic = model_fit.aic
best_order = (p,d,q)
except:
continue
return best_order
# Finding the best ARIMA parameters
best_p, best_d, best_q = best_arima_model(train)
# Using the best parameters to fit the ARIMA model
model = ARIMA(train, order=(best_p, best_d, best_q))
model_fit = model.fit()
forecast = model_fit.forecast(steps=len(test))
best_p, best_d, best_q
# Visualizing the actual vs. forecasted emissions
plt.figure(figsize=(14, 7))
plt.plot(train.index, train, label='Train', marker='o')
plt.plot(test.index, test, label='Test', marker='o')
plt.plot(test.index, forecast, label='Forecast', marker='o', linestyle='--')
plt.title('Actual vs. Forecasted Emissions')
plt.xlabel('Year')
plt.ylabel('Emissions (kt CO2e)')
plt.xticks(rotation=45)
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()
data_5_1
# Grouping data by 'Post Code' and summing up the total emissions
postcode_grouped_data = data_5_1.groupby('Post Code').sum()
# Identifying the total emission for 'West Midlands'
west_midlands_emission = postcode_grouped_data.loc['West Midlands', '2005':]
west_midlands_total_emission = west_midlands_emission.sum()
west_midlands_total_emission
# Loading the data from the "5_1" sheet
data_5_1_cleaned = pd.read_excel("2005-21-uk-local-authority-ghg-emissions-update-060723.xlsx",
sheet_name="5_1", skiprows=4)
# Dropping the first row (header info)
data_5_1_cleaned = data_5_1_cleaned.drop(0)
# Calculating the total emissions for each year
total_emissions_per_year = data_5_1_cleaned.iloc[:, 6:].sum()
# Renaming the index to match years
years = list(range(2005, 2022)) # Years from 2005 to 2021
total_emissions_per_year.index = years
# Visualizing the total emissions over the years
plt.figure(figsize=(14, 7))
total_emissions_per_year.plot(marker='o', linestyle='-')
plt.title('Total Emissions Over Time WestMidlands (2005-2021)')
plt.xlabel('Year')
plt.ylabel('Emissions (kt CO2e)')
plt.xticks(years, rotation=45)
plt.grid(True)
plt.tight_layout()
plt.show()
from statsmodels.tsa.arima.model import ARIMA
from sklearn.metrics import mean_squared_error
from math import sqrt
import warnings
warnings.filterwarnings("ignore")
# Splitting the data into train and test sets
train_size = int(len(time_series_data) * 0.75)
train, test = time_series_data[0:train_size], time_series_data[train_size:]
# Function to find the best ARIMA parameters with the lowest AIC value
def best_arima_model(data):
best_aic = float('inf')
best_order = None
for p in range(5):
for d in range(2): # limiting d to 1 as our data doesn't seem to require higher order differencing
for q in range(5):
try:
model = ARIMA(data, order=(p,d,q))
model_fit = model.fit()
if model_fit.aic < best_aic:
best_aic = model_fit.aic
best_order = (p,d,q)
except:
continue
return best_order
# Finding the best ARIMA parameters
best_p, best_d, best_q = best_arima_model(train)
# Using the best parameters to fit the ARIMA model
model = ARIMA(train, order=(best_p, best_d, best_q))
model_fit = model.fit()
forecast = model_fit.forecast(steps=len(test))
best_p, best_d, best_q
# Calculating the total emissions for each year
total_emissions_per_year = data_5_1_cleaned.iloc[:, 6:].sum()
# Renaming the index to match years
years = list(range(2005, 2022)) # Years from 2005 to 2021
total_emissions_per_year.index = years
# Visualizing the actual vs. forecasted emissions
plt.figure(figsize=(14, 7))
plt.plot(train.index, train, label='Train', marker='o')
plt.plot(test.index, test, label='Test', marker='o')
plt.plot(test.index, forecast, label='Forecast', marker='o', linestyle='--')
plt.title('Actual vs. Forecasted Emissions')
plt.xlabel('Years')
plt.ylabel('Emissions (kt CO2e)')
plt.xticks(rotation=45)
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()
# Plotting the total emissions over the years again with proper x-axis labels
plt.figure(figsize=(14, 7))
total_emissions_per_year.plot(marker='o', linestyle='-')
plt.title('Total Emissions Over Time (2005-2021)')
plt.xlabel('Year')
plt.ylabel('Emissions (kt CO2e)')
plt.xticks(years, rotation=45)
plt.grid(True)
plt.tight_layout()
plt.show()
# Plotting the total emissions over the years again with proper x-axis labels
plt.figure(figsize=(14, 7))
total_emissions_per_year.plot(marker='o', linestyle='-')
plt.title('Total Emissions Over Time (2005-2021)')
plt.xlabel('Year')
plt.ylabel('Emissions (kt CO2e)')
plt.xticks(ticks=years, labels=years, rotation=45)
plt.grid(True)
plt.tight_layout()
plt.show()
# Setting up the libraries and data again
import pandas as pd
import matplotlib.pyplot as plt
# Loading the data from the "5_1" sheet again
data_5_1_cleaned = pd.read_excel("/mnt/data/2005-21-uk-local-authority-ghg-emissions-update-060723.xlsx",
sheet_name="5_1", skiprows=4)
data_5_1_cleaned = data_5_1_cleaned.drop(0)
total_emissions_per_year = data_5_1_cleaned.iloc[:, 6:].sum()
years = list(range(2005, 2022))
# Plotting the total emissions over the years with proper x-axis labels
plt.figure(figsize=(14, 7))
total_emissions_per_year.plot(marker='o', linestyle='-')
plt.title('Total Emissions Over Time (2005-2021)')
plt.xlabel('Year')
plt.ylabel('Emissions (kt CO2e)')
plt.xticks(ticks=years, labels=years, rotation=45)
plt.grid(True)
plt.tight_layout()
plt.show()
# Re-running the optimized code
# 1. Data Preparation
data_5_1_cleaned = pd.read_excel("2005-21-uk-local-authority-ghg-emissions-update-060723.xlsx",
sheet_name="5_1", skiprows=4)
data_5_1_cleaned = data_5_1_cleaned.drop(0)
total_emissions_per_year = data_5_1_cleaned.iloc[:, 6:].sum()
years = list(range(2005, 2022))
total_emissions_per_year.index = years
# 2. Plotting Emissions Over Time
plt.figure(figsize=(14, 7))
total_emissions_per_year.plot(marker='o', linestyle='-')
plt.title('Total Emissions Over Time UK (2005-2021)')
plt.xlabel('Year')
plt.ylabel('Emissions (kt CO2e)')
plt.xticks(years, rotation=45)
plt.grid(True)
plt.tight_layout()
plt.show()
# 3. Time Series Forecasting with ARIMA
train_size = int(len(total_emissions_per_year) * 0.75)
train, test = total_emissions_per_year[0:train_size], total_emissions_per_year[train_size:]
best_p, best_d, best_q = best_arima_model(train)
model = ARIMA(train, order=(best_p, best_d, best_q))
model_fit = model.fit()
forecast = model_fit.forecast(steps=len(test))
# 4. Plotting Actual vs. Forecasted Emissions
plt.figure(figsize=(14, 7))
plt.plot(train.index, train, label='Train', marker='o')
plt.plot(test.index, test, label='Test', marker='o')
plt.plot(test.index, forecast, label='Forecast', marker='o', linestyle='--')
plt.title('Actual vs. Forecasted Emissions')
plt.xlabel('Years')
plt.ylabel('Emissions (kt CO2e)')
plt.xticks(years, rotation=45)
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()
best_p, best_d, best_q
# 1. Data Preparation
url = 'https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1166194/2005-21-uk-local-authority-ghg-emissions.xlsx'
data_5_1_cleaned = pd.read_excel(url, sheet_name='5_1', skiprows=4)
data_5_1_cleaned = data_5_1_cleaned.drop(0)
total_emissions_per_year = data_5_1_cleaned.iloc[:, 6:].sum()
years = list(range(2005, 2022))
total_emissions_per_year.index = years
# 2. Plotting Emissions Over Time
plt.figure(figsize=(14, 7))
total_emissions_per_year.plot(marker='o', linestyle='-')
plt.title('Total Emissions Over Time in Uk (2005-2021)')
plt.xlabel('Year')
plt.ylabel('Emissions (kt CO2e)')
plt.xticks(years, rotation=45)
plt.grid(True)
plt.tight_layout()
plt.show()
# 3. Time Series Forecasting with ARIMA
train_size = int(len(total_emissions_per_year) * 0.75)
train, test = total_emissions_per_year[0:train_size], total_emissions_per_year[train_size:]
best_p, best_d, best_q = best_arima_model(train)
model = ARIMA(train, order=(best_p, best_d, best_q))
model_fit = model.fit()
forecast = model_fit.forecast(steps=len(test))
# 4. Plotting Actual vs. Forecasted Emissions
plt.figure(figsize=(14, 7))
plt.plot(train.index, train, label='Train', color='blue', marker='o')
plt.plot(test.index, test, label='Test', color='green', marker='o')
plt.plot(test.index, forecast, label='Forecast',color='red', marker='o', linestyle='--')
plt.title('Actual vs. Forecasted Emissions')
plt.xlabel('Years')
plt.ylabel('Emissions (kt CO2e)')
plt.xticks(years, rotation=45)
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()
best_p, best_d, best_q
# Loading the cleaned dataset from sheet "5_1"
data_5_1_cleaned = pd.read_excel("2005-21-uk-local-authority-ghg-emissions-update-060723.xlsx", sheet_name="5_1", skiprows=4)
# List of local authority district names in West Midlands
west_midlands_districts = [
"Lichfield", "Newport", "Rugby", "Shropshire", "Staffordshire Moorlands",
"Birmingham", "Stoke-on-Trent", "South Staffordshire", "Coventry", "Cannock Chase",
"Dudley", "Stafford", "Sutton", "Telford and Wrekin", "Wolverhampton",
"Tamworth", "Walsall", "Warwick", "Sandwell", "Herefordshire, County of",
"East Staffordshire", "Stratford-on-Avon", "Newcastle-under-Lyme", "Bromsgrove",
"Solihull", "Nuneaton and Bedworth", "Redditch", "Wyre Forest", "Worcester"
]
# Filtering the dataset for only the West Midlands districts
west_midlands_data = data_5_1_cleaned[data_5_1_cleaned['Local Authority District name (2013 boundaries)'].isin(west_midlands_districts)]
# Displaying the first few rows of the West Midlands data
west_midlands_data.head()
import pandas as pd
# Loading the cleaned dataset from sheet "5_1"
data_5_1_cleaned = pd.read_excel("2005-21-uk-local-authority-ghg-emissions-update-060723.xlsx", sheet_name="5_1", skiprows=4)
# Filtering the dataset for only the West Midlands districts
west_midlands_data = data_5_1_cleaned[data_5_1_cleaned['Local Authority District name (2013 boundaries)'].isin(west_midlands_districts)]
# Displaying the first few rows of the West Midlands data
west_midlands_data.head()
# Displaying the columns of the cleaned data to identify the correct name for "Local Authority District name (2013 boundaries)"
data_5_1_cleaned.columns
# Displaying the first few rows of the cleaned dataset to understand its structure
data_5_1_cleaned.head()
# Adjusting the dataset to use the correct column names and dropping unnecessary rows
data_5_1_cleaned.columns = data_5_1_cleaned.iloc[0]
data_5_1_cleaned = data_5_1_cleaned.drop(0)
# Filtering the dataset for only the West Midlands districts
west_midlands_data = data_5_1_cleaned[data_5_1_cleaned['Local Authority District Name'].isin(west_midlands_districts)]
# Displaying the first few rows of the West Midlands data
west_midlands_data.head()
# List of local authority district names in West Midlands
west_midlands_districts = [
"Lichfield", "Newport", "Rugby", "Shropshire", "Staffordshire Moorlands",
"Birmingham", "Stoke-on-Trent", "South Staffordshire", "Coventry", "Cannock Chase",
"Dudley", "Stafford", "Sutton", "Telford and Wrekin", "Wolverhampton",
"Tamworth", "Walsall", "Warwick", "Sandwell", "Herefordshire, County of",
"East Staffordshire", "Stratford-on-Avon", "Newcastle-under-Lyme", "Bromsgrove",
"Solihull", "Nuneaton and Bedworth", "Redditch", "Wyre Forest", "Worcester"
]
# Filtering the dataset for only the West Midlands districts
west_midlands_data = data_5_1_cleaned[data_5_1_cleaned['Local Authority District Name'].isin(west_midlands_districts)]
# Displaying the first few rows of the West Midlands data
west_midlands_data.head()
# Summarizing the total emissions over the years for the West Midlands districts
yearly_emissions_west_midlands = west_midlands_data.iloc[:, 5:].sum().reset_index()
yearly_emissions_west_midlands.columns = ['Year', 'Total Emissions']
# Displaying the summarized yearly emissions for West Midlands
yearly_emissions_west_midlands
# Summarizing the total emissions over the years for the West Midlands districts without resetting the index
yearly_emissions_west_midlands = west_midlands_data.iloc[:, 5:].sum()
# Converting the series to a dataframe for better visualization
yearly_emissions_west_midlands_df = yearly_emissions_west_midlands.reset_index()
yearly_emissions_west_midlands_df.columns = ['Year', 'Total Emissions']
# Displaying the summarized yearly emissions for West Midlands
yearly_emissions_west_midlands_df
# Summarizing the total emissions over the years for the West Midlands districts without the additional index column
yearly_emissions_west_midlands = west_midlands_data.groupby('Local Authority District Name').sum().sum(axis=0)
yearly_emissions_west_midlands = yearly_emissions_west_midlands.drop('Row Labels').reset_index()
yearly_emissions_west_midlands.columns = ['Year', 'Total Emissions']
# Displaying the summarized yearly emissions for West Midlands
yearly_emissions_west_midlands
# Displaying the summarized yearly emissions for West Midlands without attempting to drop the 'Row Labels'
yearly_emissions_west_midlands
import matplotlib.pyplot as plt
# Plotting the emissions over time for visual inspection of anomalies
plt.figure(figsize=(14, 7))
plt.plot(yearly_emissions_west_midlands.index, yearly_emissions_west_midlands.values, marker='o', linestyle='-')
plt.title("Yearly Emissions for All Emissions West Midlands Districts (2005-2021)")
plt.xlabel("Year")
plt.ylabel("Total Emissions (ktCO₂e)")
plt.grid(True)
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
From the plot, I observed the following:
Consistent Decline: Starting from 2005, there has been a consistent decline in emissions, with slight fluctuations. Significant Drop: Around 2020, there's a sharper decline in emissions. This could be attributed to various factors, including the global COVID-19 pandemic, which led to reduced industrial activities, travel restrictions, and other lockdown measures. Possible Anomalies: While the overall trend shows a decrease in emissions, there are certain years where the decline wasn't as pronounced or even increased slightly.
Inflection Points: Around the years 2007-2008 and 2013-2014, there seem to be inflection points where the rate of decrease in emissions changes. Investigating the reasons behind these inflections could provide insights into specific interventions, policies, or global events that impacted emissions.
Consistent Trends: The consistency in the decline, especially from 2010 onwards, might indicate the effectiveness of environmental policies, advancements in green technologies, or shifts in industrial practices in the West Midlands.
External Factors: It's crucial to consider external factors that might have influenced these emissions. For instance: Economic downturns or booms can influence industrial output and thus emissions. The introduction of renewable energy sources or energy-efficient technologies in the region. Changes in transportation infrastructure or adoption rates of electric vehicles. Policies and incentives promoting sustainable practices among businesses and households.
To delve deeper:
Sectoral Analysis: One could break down the emissions by sector (e.g., transportation, industrial, residential) to understand which sectors contribute most to the emissions and which have seen the most significant reductions.
Correlation with Other Data: Comparing this emissions data with other datasets, such as economic indicators, energy consumption patterns, or transportation stats, might shed light on the drivers behind these emission trends.
Spatial Analysis: It would be beneficial to compare West Midlands' emission trends with other regions in the UK or globally. Such a comparison might help understand if West Midlands' trends are unique or part of a broader national or global trend.
Deep Dive into Anomalies: For the years where the decline wasn't as pronounced or even increased slightly, one could investigate specific events or changes during those years that might have led to such patterns.
Interviews & Surveys: Engaging with local environmental agencies, businesses, and communities can provide qualitative insights into the reasons behind the observed trends. They might be aware of local initiatives, challenges, or events that influenced emissions.
# Extracting unique sector names from the dataset
sectors = west_midlands_data['Sector Name'].unique()
sectors
# Extracting unique sector names from the dataset
sectors = west_midlands_data['Sector Name'].unique()
sectors
# Checking the column names in the dataset
west_midlands_data.columns
# Grouping by 'Operator' and summing the emissions for each year
operator_emissions = west_midlands_data.groupby('Operator').sum()
# Identifying the top 10 emitting operators over the entire period (2005-2021)
top_operators = operator_emissions.sum(axis=1).nlargest(10)
top_operators
import matplotlib.pyplot as plt
# Extracting emission data for the top operators
top_operator_data = operator_emissions.loc[top_operators.index]
# Plotting the emissions trends of top operators over time
plt.figure(figsize=(15, 8))
for operator in top_operator_data.index:
plt.plot(top_operator_data.columns, top_operator_data.loc[operator], label=operator, marker='o')
plt.title('Emission Trends of Top Operators (2005-2021)')
plt.xlabel('Year')
plt.ylabel('Emissions (ktCO₂e)')
plt.xticks(rotation=45)
plt.legend()
plt.grid(True, which='both', linestyle='--', linewidth=0.5)
plt.tight_layout()
plt.show()
Group the data by 'Operator' and sum the emissions for each year. Identify the top emitting operators over the entire period. Visualize the emissions trends of these top operators over time.
The top 10 emitting operators in the West Midlands region over the entire period (2005-2021):
Rugeley Power Ltd: 50,060.33 ktCO₂e Rugby Ltd: 20,268.41 ktCO₂e E.ON UK Plc: 10,520.58 ktCO₂e Siemens Plc: 9,250.63 ktCO₂e AES Fifoots Point Ltd: 8,837.38 ktCO₂e MES Environmental Ltd: 7,130.30 ktCO₂e TXU Europe Merchant Generation Ltd: 6,428.79 ktCO₂e Biffa Waste Services Ltd: 5,631.62 ktCO₂e Blue Circle Industries Plc: 5,405.53 ktCO₂e Tyseley Waste Disposal Ltd: 4,507.63 ktCO₂e
Observations:
Rugeley Power Ltd: This operator had a very high emission level, especially from 2005 to 2016. However, there's a significant drop after 2016, indicating some change in operations or policies.
Rugby Ltd: Emissions have decreased over the years, with some spikes in the middle years.
E.ON UK Plc: This operator's emissions have generally been on the decline since 2005.
Other operators have also shown varying emission patterns, but in most cases, there's a noticeable downward trend in recent years.
This visualization provides a snapshot of the emission behaviors of the top operators. Each operator's trend could be influenced by various factors, including changes in operational scale, adoption of sustainable technologies, regulatory pressures, or shifts in energy sources.
from scipy.stats import linregress
# Calculate the linear regression slope for each operator's emissions over the years
slopes = {}
for operator, row in operator_emissions.iterrows():
slope, _, _, _, _ = linregress(range(len(row)), row.values)
slopes[operator] = slope
# Filter operators with positive slopes (indicating an upward trend)
increasing_operators = {operator: slope for operator, slope in slopes.items() if slope > 0}
# Rank operators based on the slope value
sorted_increasing_operators = dict(sorted(increasing_operators.items(), key=lambda item: item[1], reverse=True))
sorted_increasing_operators
To find operators with a steady increase in emissions over the years
Calculate the linear regression slope for each operator's emissions over the years. Identify operators with a positive slope, indicating an upward trend. Rank operators based on the slope value to pinpoint those with the steepest increase.
From our analysis, here are the top operators in West Midlands with a steady increase in emissions over the years, ranked by the magnitude of their increasing trend:
Siemens Plc: An increase of approximately 52.66 units per year. Lafarge Cauldon Ltd: An increase of approximately 49.66 units per year. Veolia ES Staffordshire Ltd: An increase of approximately 28.02 units per year. Tyseley Waste Disposal Ltd: An increase of approximately 10.99 units per year. Viridor South London Ltd: An increase of approximately 10.29 units per year.
# Loading the cleaned dataset from sheet "2_1"
data_2_1_cleaned = pd.read_excel("2005-21-uk-local-authority-ghg-emissions-update-060723.xlsx", sheet_name="2_1", skiprows=4)
# List of local authority district names in West Midlands
west_midlands_districts = [
"Lichfield", "Newport", "Rugby", "Shropshire", "Staffordshire Moorlands",
"Birmingham", "Stoke-on-Trent", "South Staffordshire", "Coventry", "Cannock Chase",
"Dudley", "Stafford", "Sutton", "Telford and Wrekin", "Wolverhampton",
"Tamworth", "Walsall", "Warwick", "Sandwell", "Herefordshire, County of",
"East Staffordshire", "Stratford-on-Avon", "Newcastle-under-Lyme", "Bromsgrove",
"Solihull", "Nuneaton and Bedworth", "Redditch", "Wyre Forest", "Worcester"
]
# Displaying the column names of the dataset from sheet "2_1" to identify the correct name for the local authority district
data_2_1_cleaned.columns
# Filtering the dataset for only the West Midlands districts using the correct column name
west_midlands_data_2_1 = data_2_1_cleaned[data_2_1_cleaned['Local Authority'].isin(west_midlands_districts)]
# Grouping by year and summing up the emissions for all West Midlands districts
west_midlands_yearly_emissions = west_midlands_data_2_1.groupby('Calendar Year')['Grand Total'].sum().reset_index()
# Displaying the yearly emissions for the West Midlands districts
west_midlands_yearly_emissions
# Columns of interest for emissions analysis
emission_sections = [
"Industry Electricity", "Industry Gas", "Large Industrial Installations", "Industry 'Other'", "Industry Total",
"Commercial Electricity", "Commercial Gas", "Commercial 'Other'", "Commercial Total",
"Public Sector Electricity", "Public Sector Gas", "Public Sector 'Other'", "Public Sector Total",
"Domestic Electricity", "Domestic Gas", "Domestic 'Other'", "Domestic Total",
"Road Transport (A roads)", "Road Transport (Minor roads)", "Transport 'Other'", "Transport Total",
"Agriculture Electricity", "Agriculture Gas", "Agriculture 'Other'", "Agriculture Total",
"Waste Management 'Other'", "Waste Management Total"
]
# Summing up the emissions for each section across all years
total_emissions_by_section = west_midlands_data_2_1[emission_sections].sum().sort_values(ascending=False)
# Extracting the top 5 emission sections
top_5_emission_sections = total_emissions_by_section.head(5)
top_5_emission_sections
# Checking the available columns in the dataset
available_columns = west_midlands_data_2_1.columns
# Filtering out the columns from the emission_sections list that are not in the available columns
valid_emission_sections = [col for col in emission_sections if col in available_columns]
valid_emission_sections
# Summing up the emissions for each valid section across all years
total_emissions_by_valid_section = west_midlands_data_2_1[valid_emission_sections].sum().sort_values(ascending=False)
# Extracting the top 5 emission sections from the valid sections
top_5_emission_valid_sections = total_emissions_by_valid_section.head(5)
top_5_emission_valid_sections
# Grouping by "Calendar Year" and summing the emissions for top 5 sections
yearly_emissions = west_midlands_data_2_1.groupby("Calendar Year")[top_5_emission_valid_sections.index].sum()
# Plotting the trends
plt.figure(figsize=(15, 10))
for section in top_5_emission_valid_sections.index:
plt.plot(yearly_emissions.index, yearly_emissions[section], label=section, marker='o')
plt.title('Emission Trends for Top 5 Emission Sources in West Midlands (2005-2021)')
plt.xlabel('Year')
plt.ylabel('Emissions (kt CO2e)')
plt.legend()
plt.grid(True)
plt.show()
# Loading the cleaned dataset from sheet "2_1"
data_1_2_cleaned = pd.read_excel("2005-21-uk-local-authority-ghg-emissions-update-060723.xlsx", sheet_name="1_2", skiprows=4)
# List of local authority district names in West Midlands
west_midlands_districts = [
"Lichfield", "Newport", "Rugby", "Shropshire", "Staffordshire Moorlands",
"Birmingham", "Stoke-on-Trent", "South Staffordshire", "Coventry", "Cannock Chase",
"Dudley", "Stafford", "Sutton", "Telford and Wrekin", "Wolverhampton",
"Tamworth", "Walsall", "Warwick", "Sandwell", "Herefordshire, County of",
"East Staffordshire", "Stratford-on-Avon", "Newcastle-under-Lyme", "Bromsgrove",
"Solihull", "Nuneaton and Bedworth", "Redditch", "Wyre Forest", "Worcester"
]
# Displaying the column names of the dataset from sheet "1_2" to identify the correct name for the local authority district
data_1_2_cleaned.columns
# Filtering the dataset for only the West Midlands districts using the correct column name
west_midlands_data_1_2 = data_1_2_cleaned[data_1_2_cleaned['Local Authority'].isin(west_midlands_districts)]
# Grouping by year and summing up the emissions for all West Midlands districts
west_midlands_yearly_emissions = west_midlands_data_1_2.groupby('Calendar Year')['Grand Total'].sum().reset_index()
# Displaying the yearly emissions for the West Midlands districts
west_midlands_yearly_emissions
# Displaying the columns of the dataset to identify the correct column names
data_1_2_cleaned.columns
# Correcting the column names with extra spaces
columns_to_consider_corrected = [
col if not col.endswith('Gas') else col + ' ' for col in columns_to_consider
]
# Sum the emissions across all years for each corrected column
emissions_sum_corrected = west_midlands_data_1_2[columns_to_consider_corrected].sum()
# Sort the columns based on the total emissions and select top 5
top_5_emissions_sources_corrected = emissions_sum_corrected.sort_values(ascending=False).head(5)
top_5_emissions_sources_corrected
# Correcting the column names based on the dataset's actual columns
corrected_columns = [col if col not in ['Domestic Gas', 'Agriculture Gas'] else col for col in columns_to_consider_corrected]
# Sum the emissions across all years for each corrected column
emissions_sum_final = west_midlands_data_1_2[corrected_columns].sum()
# Sort the columns based on the total emissions and select top 5
top_5_emissions_sources_final = emissions_sum_final.sort_values(ascending=False).head(5)
top_5_emissions_sources_final
# Directly using the known columns for emission sources
emission_columns = [
"Industry Electricity", "Industry Gas ", "Large Industrial Installations", "Industry 'Other'", "Industry Total",
"Commercial Electricity", "Commercial Gas ", "Commercial 'Other'", "Commercial Total",
"Public Sector Electricity", "Public Sector Gas ", "Public Sector 'Other'", "Public Sector Total",
"Domestic Electricity", "Domestic Gas ", "Domestic 'Other'", "Domestic Total",
"Road Transport (A roads)", "Road Transport (Motorways)", "Road Transport (Minor roads)", "Diesel Railways",
"Transport 'Other'", "Transport Total", "Net Emissions: Forest land", "Net Emissions: Cropland",
"Net Emissions: Grassland", "Net Emissions: Wetlands", "Net Emissions: Settlements",
"Net Emissions: Harvested Wood Products", "Net Emissions: Indirect N2O", "LULUCF Net Emissions",
"Agriculture Electricity", "Agriculture Gas ", "Agriculture 'Other'", "Agriculture Livestock",
"Agriculture Soils", "Agriculture Total", "Landfill", "Waste Management 'Other'", "Waste Management Total"
]
# Sum the emissions across all years for each emission column
emissions_sum_actual = west_midlands_data_1_2[emission_columns].sum()
# Sort the columns based on the total emissions and select top 5
top_5_emissions_sources_actual = emissions_sum_actual.sort_values(ascending=False).head(5)
top_5_emissions_sources_actual
# Identify the missing columns from the dataset
missing_columns = [col for col in emission_columns if col not in west_midlands_data_1_2.columns]
missing_columns
# Excluding the missing columns
emission_columns_excluding_missing = [col for col in emission_columns if col not in missing_columns]
# Sum the emissions across all years for each emission column excluding the missing ones
emissions_sum_excluding_missing = west_midlands_data_1_2[emission_columns_excluding_missing].sum()
# Sort the columns based on the total emissions and select top 5
top_5_emissions_sources_excluding_missing = emissions_sum_excluding_missing.sort_values(ascending=False).head(5)
top_5_emissions_sources_excluding_missing
# Extracting the top 5 emission sources and their values
top_5_emission_sources = {
"Transport Total": 194372.61,
"Domestic Total": 177573.77,
"Industry Total": 132556.21,
"Road Transport (A roads)": 72630.97,
"Domestic Electricity": 67037.06
}
# Plotting the top 5 emission sources
plt.figure(figsize=(12, 7))
plt.bar(top_5_emission_sources.keys(), top_5_emission_sources.values(), color=['blue', 'green', 'red', 'cyan', 'yellow'])
plt.xlabel('Emission Sources')
plt.ylabel('Emissions (kt CO₂e)')
plt.title('Top 5 Emission Sources in West Midlands (2005-2021)')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
# Displaying the column names of the dataset from sheet "1_2"
columns_1_2 = data_1_2_cleaned.columns.tolist()
# Filtering the dataset for only the West Midlands districts using the correct column name
west_midlands_data_1_2 = data_1_2_cleaned[data_1_2_cleaned['Local Authority'].isin(west_midlands_districts)]
# Grouping by year and summing up the emissions for all West Midlands districts
west_midlands_yearly_emissions = west_midlands_data_1_2.groupby('Calendar Year')['Grand Total'].sum().reset_index()
# Displaying the yearly emissions for the West Midlands districts
west_midlands_yearly_emissions
# Directly using the known columns for emission sources
emission_columns = [
"Industry Electricity", "Industry Gas ", "Large Industrial Installations", "Industry 'Other'", "Industry Total",
"Commercial Electricity", "Commercial Gas ", "Commercial 'Other'", "Commercial Total",
"Public Sector Electricity", "Public Sector Gas ", "Public Sector 'Other'", "Public Sector Total",
"Domestic Electricity", "Domestic Gas ", "Domestic 'Other'", "Domestic Total",
"Road Transport (A roads)", "Road Transport (Motorways)", "Road Transport (Minor roads)", "Diesel Railways",
"Transport 'Other'", "Transport Total", "Net Emissions: Forest land", "Net Emissions: Cropland",
"Net Emissions: Grassland", "Net Emissions: Wetlands", "Net Emissions: Settlements",
"Net Emissions: Harvested Wood Products", "Net Emissions: Indirect N2O", "LULUCF Net Emissions",
"Agriculture Electricity", "Agriculture Gas ", "Agriculture 'Other'", "Agriculture Livestock",
"Agriculture Soils", "Agriculture Total", "Landfill", "Waste Management 'Other'", "Waste Management Total"
]
# Sum the emissions across all years for each emission column
# Calculating the sum of emissions for each emission source across all years for West Midlands districts
emission_sums = west_midlands_data_1_2[emission_columns].sum().sort_values(ascending=False)
# Displaying the top 5 emission sources
top_5_emission_sources = emission_sums.head(5)
top_5_emission_sources
# Removing any trailing spaces from the emission columns list
cleaned_emission_columns = [col.strip() for col in emission_columns]
# Calculating the sum of emissions for each emission source across all years for West Midlands districts
emission_sums = west_midlands_data_1_2[cleaned_emission_columns].sum().sort_values(ascending=False)
# Displaying the top 5 emission sources
top_5_emission_sources = emission_sums.head(5)
top_5_emission_sources
# Checking the columns in the dataset to find the correct names
matching_columns = [col for col in data_1_2_cleaned.columns if any(emission in col for emission in cleaned_emission_columns)]
non_matching_columns = set(cleaned_emission_columns) - set(matching_columns)
matching_columns, non_matching_columns
# Correcting the column names
corrected_emission_columns = [col if col not in non_matching_columns else col + " " for col in cleaned_emission_columns]
# Calculating the sum of emissions for each emission source across all years for West Midlands districts
emission_sums = west_midlands_data_1_2[corrected_emission_columns].sum().sort_values(ascending=False)
# Displaying the top 5 emission sources
top_5_emission_sources = emission_sums.head(5)
top_5_emission_sources
# Grouping by "Calendar Year" and summing the emissions for top 5 sections
columns_of_interest = ["Transport Total", "Domestic Total", "Industry Total", "Domestic Gas", "Road Transport (A roads)"]
yearly_emissions = west_midlands_data_1_2.groupby('Calendar Year')[columns_of_interest].sum()
# Plotting the trend for the top 5 sections
yearly_emissions.plot(figsize=(14, 7), marker='o')
plt.title("Yearly Emissions Trend for Top 5 Sections in West Midlands")
plt.ylabel("Emissions (kt CO₂e)")
plt.xlabel("Year")
plt.grid(True)
plt.legend(loc="upper right")
plt.tight_layout()
plt.show()
The graph above displays the yearly emissions (CO2) trend for the top 5 sections in the West Midlands from 2005 to 2021 Full Dataset.
Transport Total: This section has consistently been the highest emitter across the years, with a slight declining trend over time. Domestic Total: Emissions from the domestic sector have been decreasing gradually since 2005. Industry Total: The industry's total emissions also show a declining trend, although the reduction is not as steep as in the domestic sector. Domestic Gas: Emissions from domestic gas consumption have decreased over the years, aligning with the overall domestic total trend. Road Transport (A roads): Emissions from A roads have remained relatively stable, with slight fluctuations but no significant declining trend.
These trends reflect a positive move towards reducing carbon emissions in some sectors, especially the domestic and industry sectors. However, transportation, particularly A roads, hasn't seen significant reductions. This information can guide policy-making and intervention strategies to further reduce emissions in the West Midlands.
# Loading the dataset from sheet "1_3"
data_1_3_cleaned = pd.read_excel("2005-21-uk-local-authority-ghg-emissions-update-060723.xlsx", sheet_name="1_3", skiprows=4)
# Filtering the dataset for only the West Midlands districts using the correct column name
west_midlands_data_1_3 = data_1_3_cleaned[data_1_3_cleaned['Local Authority'].isin(west_midlands_districts)]
# Grouping by year and summing up the emissions for all West Midlands districts
west_midlands_yearly_emissions_methane = west_midlands_data_1_3.groupby('Calendar Year')['Grand Total'].sum().reset_index()
# Displaying the yearly methane emissions for the West Midlands districts
west_midlands_yearly_emissions_methane
# List of emission columns
emission_columns = [
"Industry Electricity", "Industry Gas", "Large Industrial Installations", "Industry 'Other'", "Industry Total",
"Commercial Electricity", "Commercial Gas", "Commercial 'Other'", "Commercial Total",
"Public Sector Electricity", "Public Sector Gas", "Public Sector 'Other'", "Public Sector Total",
"Domestic Electricity", "Domestic Gas", "Domestic 'Other'", "Domestic Total",
"Road Transport (A roads)", "Road Transport (Motorways)", "Road Transport (Minor roads)", "Diesel Railways",
"Transport 'Other'", "Transport Total", "Net Emissions: Forest land", "Net Emissions: Cropland",
"Net Emissions: Grassland", "Net Emissions: Wetlands", "Net Emissions: Settlements",
"Net Emissions: Harvested Wood Products", "Net Emissions: Indirect N2O", "LULUCF Net Emissions",
"Agriculture Electricity", "Agriculture Gas", "Agriculture 'Other'", "Agriculture Livestock",
"Agriculture Soils", "Agriculture Total", "Landfill", "Waste Management 'Other'", "Waste Management Total"
]
# Summing emissions across all years for each emission column
emissions_total = west_midlands_data_1_3[emission_columns].sum()
# Getting the top 5 emission sources based on the total emissions
top_5_emissions_sources = emissions_total.nlargest(5)
top_5_emissions_sources
# Checking the columns in the dataset to identify any discrepancies
[col for col in west_midlands_data_1_3.columns if 'Gas' in col]
# Adjusting the column names based on the actual names in the dataset
adjusted_emission_columns = [
"Industry Electricity", "Industry Gas ", "Large Industrial Installations", "Industry 'Other'", "Industry Total",
"Commercial Electricity", "Commercial Gas ", "Commercial 'Other'", "Commercial Total",
"Public Sector Electricity", "Public Sector Gas ", "Public Sector 'Other'", "Public Sector Total",
"Domestic Electricity", "Domestic Gas", "Domestic 'Other'", "Domestic Total",
"Road Transport (A roads)", "Road Transport (Motorways)", "Road Transport (Minor roads)", "Diesel Railways",
"Transport 'Other'", "Transport Total", "Net Emissions: Forest land", "Net Emissions: Cropland",
"Net Emissions: Grassland", "Net Emissions: Wetlands", "Net Emissions: Settlements",
"Net Emissions: Harvested Wood Products", "Net Emissions: Indirect N2O", "LULUCF Net Emissions",
"Agriculture Electricity", "Agriculture Gas", "Agriculture 'Other'", "Agriculture Livestock",
"Agriculture Soils", "Agriculture Total", "Landfill", "Waste Management 'Other'", "Waste Management Total"
]
# Summing emissions across all years for each emission column using the adjusted column names
adjusted_emissions_total = west_midlands_data_1_3[adjusted_emission_columns].sum()
# Getting the top 5 emission sources based on the total emissions
top_5_adjusted_emissions_sources = adjusted_emissions_total.nlargest(5)
top_5_adjusted_emissions_sources
# Summing the emissions across all years for each emission column
emission_totals_1_3 = west_midlands_data_1_3[emission_columns].sum()
# Extracting the top 5 emission sources
top_5_sources_1_3 = emission_totals_1_3.nlargest(5).index
top_5_emissions_1_3 = west_midlands_data_1_3.groupby("Calendar Year")[top_5_sources_1_3].sum()
# Plotting the top 5 emission sources
top_5_emissions_1_3.plot(figsize=(14, 7), title="Top 5 Emission Sources in West Midlands (2005-2021)")
plt.ylabel("Emissions (kt CO₂e)")
plt.grid(True)
plt.legend(loc="upper right")
plt.tight_layout()
plt.show()
# Correcting the columns for sheet "1_3"
corrected_emission_columns = [col for col in emission_columns if col in west_midlands_data_1_3.columns]
# Summing the emissions across all years for each corrected emission column
emission_totals_1_3_corrected = west_midlands_data_1_3[corrected_emission_columns].sum()
# Extracting the top 5 emission sources
top_5_sources_1_3_corrected = emission_totals_1_3_corrected.nlargest(5).index
top_5_emissions_1_3_corrected = west_midlands_data_1_3.groupby("Calendar Year")[top_5_sources_1_3_corrected].sum()
# Plotting the top 5 emission sources
top_5_emissions_1_3_corrected.plot(figsize=(14, 7), title="Top 5 Methane Emission Sources in West Midlands (2005-2021) from sheet '1_3'", marker='o')
plt.ylabel("Emissions (kt CO₂e)")
plt.grid(True)
plt.legend(loc="upper right")
plt.tight_layout()
plt.show()
# Summing the emissions across all years for each emission column and then getting the least 5
least_5_emissions = west_midlands_data_1_3[emission_columns].sum().nsmallest(5)
# Displaying the least 5 emission sources
least_5_emissions
# Fixing the column names by removing the extra spaces
corrected_emission_columns = [col.strip() for col in emission_columns]
# Summing the emissions across all years for each emission column and then getting the least 5
least_5_emissions = west_midlands_data_1_3[corrected_emission_columns].sum().nsmallest(5)
# Displaying the least 5 emission sources
least_5_emissions
# Checking the available columns in the dataframe to identify the discrepancies in the names
available_columns = west_midlands_data_1_3.columns
discrepancy_columns = [col for col in corrected_emission_columns if col not in available_columns]
# Displaying the columns causing discrepancies
discrepancy_columns
# Checking the column names in the dataframe that are closest to the discrepancy columns
potential_matches = {discrepancy: [col for col in available_columns if discrepancy.split()[0] in col] for discrepancy in discrepancy_columns}
# Displaying the potential matches for the discrepancy columns
potential_matches
# Correcting the names in the emission columns list
corrected_emission_columns = [col if col not in discrepancy_columns else col + ' ' for col in corrected_emission_columns]
# Summing the emissions across all years for each emission column and then getting the least 5
least_5_emissions = west_midlands_data_1_3[corrected_emission_columns].sum().nsmallest(5)
# Displaying the least 5 emission sources
least_5_emissions
# Plotting the five sources with the least emissions
least_emission_sources = least_5_emissions.index.tolist()
least_emissions_data = west_midlands_data_1_3.groupby('Calendar Year')[least_emission_sources].sum()
plt.figure(figsize=(15, 8))
for source in least_emission_sources:
plt.plot(least_emissions_data.index, least_emissions_data[source], label=source, marker='o')
plt.title("Least 5 Methane Emission Sources in West Midlands (2005-2021)")
plt.xlabel("Year")
plt.ylabel("Emissions (kt CO2e)")
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()
import tensorflow as tf
# Assuming X_train is your training data
m, n_x = X_train.shape
# Initialize weights and biases for each layer
W1 = tf.Variable(tf.random.normal([25, 64]))
b1 = tf.Variable(tf.zeros([25, 1]))
W2 = tf.Variable(tf.random.normal([15, 25]))
b2 = tf.Variable(tf.zeros([15, 1]))
W3 = tf.Variable(tf.random.normal([1, 15]))
b3 = tf.Variable(tf.zeros([1, 1]))
# Forward Propagation
def forward_propagation(X):
Z1 = tf.matmul(W1, X) + b1
A1 = tf.sigmoid(Z1)
Z2 = tf.matmul(W2, A1) + b2
A2 = tf.sigmoid(Z2)
Z3 = tf.matmul(W3, A2) + b3
A3 = tf.sigmoid(Z3)
return A3
# Assuming y_train is your training labels
y_hat = forward_propagation(X_train.T)
import matplotlib.pyplot as plt
# Plotting the CO2 emissions data for better visualization
plt.figure(figsize=(16, 10))
# Plotting each emission source over the years
for column in co2_emissions_data.columns[1:]:
plt.plot(co2_emissions_data['Year'], co2_emissions_data[column], label=column, marker='o')
plt.title('CO2 Emissions Trend in West Midlands (2005-2021)')
plt.xlabel('Year')
plt.ylabel('Emissions (kt CO2e)')
plt.legend(loc='upper right')
plt.grid(True)
plt.tight_layout()
plt.show()
# Load the Excel file and list all sheet names
xls = pd.ExcelFile("visualization of the GreenHouseEmissions.xls")
sheet_names = xls.sheet_names
sheet_names
# Import necessary libraries and try to load the Excel file again
import pandas as pd
# Load the Excel file and list all sheet names
xls = pd.ExcelFile("visualization of the GreenHouseEmissions.xls")
sheet_names = xls.sheet_names
sheet_names
# Load the data from the "CO2 Emissions westmidlands" sheet
co2_emissions_data = xls.parse('CO2 Emissions westmidlands')
# Display the first few rows of the CO2 emissions data
co2_emissions_data.head()
# Plotting the CO2 emissions data for better visualization
plt.figure(figsize=(16, 10))
# Plotting each emission source over the years
for column in co2_emissions_data.columns[1:]:
plt.plot(co2_emissions_data['Year'], co2_emissions_data[column], label=column, marker='o')
plt.title('CO2 Emissions Trend in West Midlands (2005-2021)')
plt.xlabel('Year')
plt.ylabel('Emissions (kt CO2e)')
plt.legend(loc='upper right')
plt.grid(True)
plt.tight_layout()
plt.show()
import pandas as pd
# Load the dataset from the provided Excel file
data = pd.read_excel("visualization of the GreenHouseEmissions.xls")
# Display the first few rows of the dataset to understand its structure
data.head()
import matplotlib.pyplot as plt
# Plotting the "Transport Total" emissions over the years
plt.figure(figsize=(12, 6))
plt.plot(data['Year'], data['Sum of Transport Total'], marker='o', linestyle='-')
plt.title('Transport Total Emissions Over Time')
plt.xlabel('Year')
plt.ylabel('Emissions')
plt.grid(True)
plt.tight_layout()
plt.show()
import pandas as pd
# Load the dataset from the provided Excel file
data = pd.read_excel("visualization of the GreenHouseEmissions.xls")
# Display the first few rows of the dataset to understand its structure
data.head()
import matplotlib.pyplot as plt
# Plotting the "Transport Total" emissions over the years
plt.figure(figsize=(12, 6))
plt.plot(data['Year'], data['Sum of Transport Total'], marker='o', linestyle='-')
plt.title('Transport Total Emissions Over Time')
plt.xlabel('Year')
plt.ylabel('Emissions')
plt.grid(True)
plt.tight_layout()
plt.show()
from statsmodels.tsa.stattools import adfuller
# Augmented Dickey-Fuller test
result = adfuller(data['Sum of Transport Total'])
adf_statistic = result[0]
p_value = result[1]
adf_statistic, p_value
# First order differencing
data['Transport_Diff'] = data['Sum of Transport Total'].diff()
# Plotting the differenced data
plt.figure(figsize=(12, 6))
plt.plot(data['Year'][1:], data['Transport_Diff'][1:], marker='o', linestyle='-')
plt.title('First Order Differenced Transport Total Emissions')
plt.xlabel('Year')
plt.ylabel('Differenced Emissions')
plt.grid(True)
plt.tight_layout()
plt.show()
# Augmented Dickey-Fuller test on the differenced data
result_diff = adfuller(data['Transport_Diff'].dropna()) # Removing NaN resulted from differencing
adf_statistic_diff = result_diff[0]
p_value_diff = result_diff[1]
adf_statistic_diff, p_value_diff
# Second order differencing
data['Transport_Diff_2'] = data['Transport_Diff'].diff()
# Plotting the second order differenced data
plt.figure(figsize=(12, 6))
plt.plot(data['Year'][2:], data['Transport_Diff_2'][2:], marker='o', linestyle='-')
plt.title('Second Order Differenced Transport Total Emissions')
plt.xlabel('Year')
plt.ylabel('Differenced Emissions')
plt.grid(True)
plt.tight_layout()
plt.show()
# Augmented Dickey-Fuller test on the second order differenced data
result_diff_2 = adfuller(data['Transport_Diff_2'].dropna()) # Removing NaN resulted from differencing
adf_statistic_diff_2 = result_diff_2[0]
p_value_diff_2 = result_diff_2[1]
adf_statistic_diff_2, p_value_diff_2
# Plotting ACF and PACF with fewer lags
fig, ax = plt.subplots(1, 2, figsize=(15, 4))
# ACF plot
plot_acf(data['Transport_Diff_2'].dropna(), ax=ax[0], lags=6)
ax[0].set_title('Autocorrelation Function (ACF)')
# PACF plot
plot_pacf(data['Transport_Diff_2'].dropna(), ax=ax[1], lags=6)
ax[1].set_title('Partial Autocorrelation Function (PACF)')
plt.tight_layout()
plt.show()
import pandas as pd
from statsmodels.tsa.arima.model import ARIMA
# Reload the dataset
data = pd.read_excel("visualization of the GreenHouseEmissions.xls")
# Define the revised p, d, q range for grid search
p_range = range(0, 3)
d_range = [2] # Only using d=2 based on our earlier analysis
q_range = range(0, 3)
# Store results
results_summary_revised = []
# Grid search
for p in p_range:
for d in d_range:
for q in q_range:
try:
model = ARIMA(data['Sum of Transport Total'], order=(p, d, q))
result = model.fit()
results_summary_revised.append((p, d, q, result.aic, result.bic))
except:
continue
# Convert results into a DataFrame for easy viewing
results_df_revised = pd.DataFrame(results_summary_revised, columns=['p', 'd', 'q', 'AIC', 'BIC'])
# Sort by AIC (Akaike Information Criterion)
results_df_revised = results_df_revised.sort_values(by='AIC')
results_df_revised.head()
# Fit the ARIMA(0,2,0) model
model_020 = ARIMA(data['Sum of Transport Total'], order=(0, 2, 0))
results_020 = model_020.fit()
# Summary of the ARIMA(0,2,0) model
summary_020 = results_020.summary()
summary_020
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
# Extract residuals
residuals = results_020.resid
# Plot residuals
plt.figure(figsize=(15, 6))
plt.subplot(1, 2, 1)
residuals.plot(title="Residuals")
plt.xlabel('Year')
plt.ylabel('Residual value')
# Histogram of residuals
plt.subplot(1, 2, 2)
residuals.hist(bins=10)
plt.title("Histogram of Residuals")
plt.xlabel('Residual value')
plt.ylabel('Frequency')
plt.tight_layout()
plt.show()
# ACF and PACF of residuals
fig, ax = plt.subplots(1, 2, figsize=(15, 4))
# ACF plot
plot_acf(residuals, ax=ax[0], lags=10)
ax[0].set_title('Autocorrelation Function (ACF) of Residuals')
# PACF plot
plot_pacf(residuals, ax=ax[1], lags=10)
ax[1].set_title('Partial Autocorrelation Function (PACF) of Residuals')
plt.tight_layout()
plt.show()
import matplotlib.pyplot as plt
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
# Plot residuals
plt.figure(figsize=(15, 6))
plt.subplot(1, 2, 1)
residuals.plot(title="Residuals")
plt.xlabel('Year')
plt.ylabel('Residual value')
# Histogram of residuals
plt.subplot(1, 2, 2)
residuals.hist(bins=10)
plt.title("Histogram of Residuals")
plt.xlabel('Residual value')
plt.ylabel('Frequency')
plt.tight_layout()
plt.show()
# ACF and PACF of residuals
fig, ax = plt.subplots(1, 2, figsize=(15, 4))
# ACF plot
plot_acf(residuals, ax=ax[0], lags=10)
ax[0].set_title('Autocorrelation Function (ACF) of Residuals')
# PACF plot
plot_pacf(residuals, ax=ax[1], lags=10)
ax[1].set_title('Partial Autocorrelation Function (PACF) of Residuals')
plt.tight_layout()
plt.show()
# ACF and PACF of residuals with fewer lags
fig, ax = plt.subplots(1, 2, figsize=(15, 4))
# ACF plot
plot_acf(residuals, ax=ax[0], lags=6)
ax[0].set_title('Autocorrelation Function (ACF) of Residuals')
# PACF plot
plot_pacf(residuals, ax=ax[1], lags=6)
ax[1].set_title('Partial Autocorrelation Function (PACF) of Residuals')
plt.tight_layout()
plt.show()
import pandas as pd
# Retry loading the first few rows from the '1_1' sheet to explore its contents
data_1_1 = pd.read_excel("2005-21-uk-local-authority-ghg-emissions-update-060723.xlsx", sheet_name='1_1')
data_1_1.head()
# Extract the actual data, excluding the header information
data_1_1_actual = data_1_1.iloc[4:]
# Set the column names from the header row
data_1_1_actual.columns = data_1_1.iloc[3]
# Reset the index for the actual data
data_1_1_actual = data_1_1_actual.reset_index(drop=True)
# Display the shape of the data
data_shape = data_1_1_actual.shape
# Get a summary of the data
data_summary = data_1_1_actual.describe()
# Check for missing values
missing_values = data_1_1_actual.isnull().sum()
data_shape, data_summary.head(), missing_values.head()
# Filter the data for the "West Midlands" Region/Country
west_midlands_data = data_1_1_actual[data_1_1_actual['Region/Country'] == 'West Midlands']
west_midlands_data.head()
import pandas as pd
# Reload the data from the '1_1' sheet
data_1_1 = pd.read_excel("2005-21-uk-local-authority-ghg-emissions-update-060723.xlsx", sheet_name='1_1')
# Extract the actual data, excluding the header information
data_1_1_actual = data_1_1.iloc[4:]
data_1_1_actual.columns = data_1_1.iloc[3]
data_1_1_actual = data_1_1_actual.reset_index(drop=True)
# Filter the data for the "West Midlands" Region/Country
west_midlands_data = data_1_1_actual[data_1_1_actual['Region/Country'] == 'West Midlands']
west_midlands_data
#west_midlands_data.to_csv('west_midlands_data.csv')
# Reload the data from the '1_1' sheet
data_1_1 = pd.read_excel("2005-21-uk-local-authority-ghg-emissions-update-060723.xlsx", sheet_name='1_1')
# Extract the actual data, excluding the header information
data_1_1_actual = data_1_1.iloc[4:]
data_1_1_actual.columns = data_1_1.iloc[3]
data_1_1_actual = data_1_1_actual.reset_index(drop=True)
# Filter the data for the "West Midlands" Region/Country
west_midlands_data = data_1_1_actual[data_1_1_actual['Region/Country'] == 'West Midlands']
# Filter out rows where "Second Tier Authority" ends with "Total"
west_midlands_data_filtered = west_midlands_data[~west_midlands_data['Second Tier Authority'].str.endswith("Total")]
west_midlands_data_filtered
# List of columns to keep
columns_to_keep = ['Region/Country', 'Second Tier Authority', 'Local Authority','Local Authority Code', 'Calendar Year', 'LULUCF Net Emissions']
# Columns that end with "Total"
total_columns = [col for col in west_midlands_data_filtered.columns if col.endswith("Total")]
# Combine both lists
columns_to_keep += total_columns
# Filter the dataset for these columns
west_midlands_filtered_columns = west_midlands_data_filtered[columns_to_keep]
west_midlands_filtered_columns
import pandas as pd
# Reload the data from the '1_1' sheet
data_1_1 = pd.read_excel("2005-21-uk-local-authority-ghg-emissions-update-060723.xlsx", sheet_name='1_1')
# Extract the actual data, excluding the header information
data_1_1_actual = data_1_1.iloc[4:]
data_1_1_actual.columns = data_1_1.iloc[3]
data_1_1_actual = data_1_1_actual.reset_index(drop=True)
# Filter the data for the "West Midlands" Region/Country
west_midlands_data = data_1_1_actual[data_1_1_actual['Region/Country'] == 'West Midlands']
# Extract the actual data, excluding the header information
data_1_1_actual = data_1_1.iloc[4:]
data_1_1_actual.columns = data_1_1.iloc[3]
data_1_1_actual = data_1_1_actual.reset_index(drop=True)
# Filter the data for the "West Midlands" Region/Country
west_midlands_data = data_1_1_actual[data_1_1_actual['Region/Country'] == 'West Midlands']
# Filter out rows where "Second Tier Authority" ends with "Total"
west_midlands_data_filtered = west_midlands_data[~west_midlands_data['Second Tier Authority'].str.endswith("Total")]
# List of columns to keep
columns_to_keep = ['Region/Country', 'Second Tier Authority', 'Local Authority','Local Authority Code', 'Calendar Year', 'LULUCF Net Emissions']
# Columns that end with "Total"
total_columns = [col for col in west_midlands_data_filtered.columns if col.endswith("Total")]
# Combine both lists
columns_to_keep += total_columns
# Filter the dataset for these columns
west_midlands_filtered_columns = west_midlands_data_filtered[columns_to_keep]
west_midlands_filtered_columns
import pandas as pd
# Load the data from the provided URL
url = 'https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1166194/2005-21-uk-local-authority-ghg-emissions.xlsx'
data_1_1 = pd.read_excel(url, sheet_name='1_1')
# Extract the actual data, excluding the header information
data_1_1_actual = data_1_1.iloc[4:]
data_1_1_actual.columns = data_1_1.iloc[3]
data_1_1_actual.reset_index(drop=True, inplace=True)
# Filter the data for the "West Midlands" Region/Country
west_midlands_data = data_1_1_actual[data_1_1_actual['Region/Country'] == 'West Midlands']
# Filter out rows where "Second Tier Authority" ends with "Total"
west_midlands_data_filtered = west_midlands_data[~west_midlands_data['Second Tier Authority'].str.endswith("Total")]
# List of columns to keep
columns_to_keep = ['Region/Country', 'Second Tier Authority', 'Local Authority', 'Local Authority Code', 'Calendar Year', 'LULUCF Net Emissions']
# Add columns that end with "Total"
columns_to_keep.extend([col for col in west_midlands_data_filtered.columns if col.endswith("Total")])
# Filter the dataset for these columns
west_midlands_filtered_columns = west_midlands_data_filtered[columns_to_keep]
west_midlands_filtered_columns